DO: code, configs (but no secrets!), server scripts, documentation
DO: gitignore, workflows, other github files
FINE: log books, relevant resources
FINE: (not sensitive!!) sample data for demo purposes
CAREFUL: notebook output
DON’T: data, large files
DON'T: secrets, names (employees, customers, ...), anything sensitive
TODO: Alternatives for data hosting???
Use python-dotenv package combined with a .env file.
.env:API_KEY=test-key
API_SECRET=test-secret
Note: Do not add the .env file to your repository!
dotenv package:from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("API_KEY")
api_secret = os.getenv("API_SECRET")
Put them in a manual config file (for example a json or yaml file) and load this file in your code. Of course you do not include this config file in the repository.
In the project documentation, make sure to include information on which secrets need to be set, because nothing is more annoying than having to run a script 10 times for it to fail consecutively at each next attempt to load a secret. For example, create a sample .env, .json or .yaml file with the right structure but bogus values for the secrets.
In the project documentation, make sure to include information on which secrets need to be set, because nothing is more annoying than having to run a script 10 times for it to fail consecutively at each next attempt to load a secret.
For example, create a sample .env, .json or .yaml file with the right structure but bogus values for the secrets.
either clone locally:
git clone git@github.com:Amsterdam-Internships/InternshipAmsterdamGeneral.git
or if you have existing project:
git init
git remote add origin git@github.com:Amsterdam-Internships/GithubDemo.git
git push --set-upstream origin master
TODO: Add convention advanced analytics
AI Team Guidelines???
AI Team Guidelines???
pip install -r requirements.txtrequests==2.28.2)pip freeze > requirements.txtpipdeptree --warn silence | grep -E '^\w+' > requirements.txtpoetry in combination with a pyproject.toml file [link]. Takes care of dependencies, virtual environment management, and building your code into a package.TODO: Elaborate on conda usage
AI Team Guidelines???
Disclaimer: free repos have a limit of 2000mins + 500MB storage
TODO: Dealing with notebooks
TODO: General pre-commit hooks
Recommendations Meeke:
When using flake8 and black in parallel, you may need to add a .flake8 file to exclude some checks from flake8, as they clash with black.
pylint ...Awesome talk on telling stories through your commits
Whenever in doubt: google "what's a good commit message"
!git config --local --list
core.repositoryformatversion=0 core.filemode=true core.bare=false core.logallrefupdates=true remote.origin.url=git@github.com:Amsterdam-Internships/GithubDemo.git remote.origin.fetch=+refs/heads/*:refs/remotes/origin/* branch.master.remote=origin branch.master.merge=refs/heads/master
!git config --global --list
user.email=iva.gornishka@gmail.com user.name=Iva Gornishka credential.helper=store
!echo "jupyter" > requirements.txt
!git status
On branch master Your branch is up to date with 'origin/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: GithubBasics.ipynb Untracked files: (use "git add <file>..." to include in what will be committed) GithubBasics-Meeke.ipynb convert-slides.sh requirements.txt no changes added to commit (use "git add" and/or "git commit -a")
!git add requirements.txt
!git status
On branch master Your branch is up to date with 'origin/master'. Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: requirements.txt Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: GithubBasics.ipynb Untracked files: (use "git add <file>..." to include in what will be committed) GithubBasics-Meeke.ipynb convert-slides.sh
!git commit -m 'updated requirements (jupyter)'
[master ec97591] updated requirements (jupyter) 1 file changed, 1 insertion(+) create mode 100644 requirements.txt
!git push
Enumerating objects: 4, done. Counting objects: 100% (4/4), done. Delta compression using up to 12 threads Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 308 bytes | 308.00 KiB/s, done. Total 3 (delta 1), reused 1 (delta 0) remote: Resolving deltas: 100% (1/1), completed with 1 local object. To github.com:Amsterdam-Internships/GithubDemo.git af23740..ec97591 master -> master
!echo "cookiecutter" >> requirements.txt
!git add requirements.txt
!git commit --amend --no-edit
[master de8c65a] updated requirements (jupyter) Date: Wed Mar 15 12:58:12 2023 +0100 1 file changed, 2 insertions(+) create mode 100644 requirements.txt
!git push -f origin
Enumerating objects: 4, done. Counting objects: 100% (4/4), done. Delta compression using up to 12 threads Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 323 bytes | 323.00 KiB/s, done. Total 3 (delta 1), reused 0 (delta 0) remote: Resolving deltas: 100% (1/1), completed with 1 local object. To github.com:Amsterdam-Internships/GithubDemo.git + ec97591...de8c65a master -> master (forced update)
!git reset --mixed HEAD~1
!git push -f
Unstaged changes after reset: M GithubBasics.ipynb Total 0 (delta 0), reused 0 (delta 0) To github.com:Amsterdam-Internships/GithubDemo.git + de8c65a...af23740 master -> master (forced update)
!git status
On branch master Your branch is up to date with 'origin/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: GithubBasics.ipynb Untracked files: (use "git add <file>..." to include in what will be committed) GithubBasics-Meeke.ipynb convert-slides.sh requirements.txt no changes added to commit (use "git add" and/or "git commit -a")
4 eyes principle.
Make sure the code...
It can bring to light workflow/process issues
Helps us learn from each other
Sidenote for Data Scientists @ Gemeente: It will become "a thing" soon, so we should help with setting up the standards and process
TODO:
hardcoded paths
output in notebooks
data in the repo
magic numbers & unnamed/positional arguments (SomeRandomModel('l2', False, 0.0001, 1.0, True, 100))
random seeds
no grid search or explanation of how params came to be
overall workflow/pipeline issue
blameless reviews