Table of contents
- Introduction to the MLOps Blog Series
- Getting Started
- What is MLOps ?
- Why Git and GitHub as the First Step ?
- Install Git & GitHub
- Create a New Repository on GitHub
- Git Cheat Sheet for Common Commands
- Initialize Git in Your Project
- Managing Changes
- Pull Changes from GitHub
- .gitignore
- Adding conda environment files to .gitignore
- Git Log
- Branching in GitHub
- Resolving Conflicts
- Conclusion
Introduction to the MLOps Blog Series
Hey there! Welcome to the MLOps blog series. In this series, my goal is to give you a solid understanding of MLOps and the tools you need to simplify and automate your machine learning processes. We’ll cover a variety of topics, starting with the basics of version control using Git and GitHub, moving through managing environments, continuous integration and deployment, and ending with monitoring and maintaining models in production. Each blog post will build on the previous one, guiding you step-by-step through MLOps.
Getting Started
In this first blog, we’ll start with Git and GitHub to introduce you to the basics of version control, laying the groundwork for more advanced topics. By the end of this series, you’ll have a strong understanding of how to implement effective MLOps practices in your projects.
What is MLOps ?
MLOps, or Machine Learning Operations, is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. It combines Machine Learning (ML) with DevOps principles to automate and streamline the entire ML lifecycle, from data preparation and model training to deployment and monitoring. MLOps ensures that machine learning models are continuously integrated, tested, and deployed in a reproducible manner, making it easier to manage and scale ML workflows.
Why Git and GitHub as the First Step ?
Git is a version control system that helps you track changes in your code, collaborate with others, and manage different versions of your projects. GitHub is a web-based platform that provides hosting for software development and version control using Git. Starting with Git and GitHub is crucial for any MLOps pipeline because it allows you to:
Version Control: Keep track of changes to your code and data.
Collaboration: Work seamlessly with other team members.
Backup: Store your code and data safely in the cloud.
Continuous Integration: Automate testing and deployment of your models.
Install Git & GitHub
Install Git: Visit the official Git website and download the appropriate version for your operating system. Follow the installation instructions.
Install GitHub: You can use GitHub through your web browser, or install GitHub Desktop from the GitHub Desktop website for a more integrated experience.
After installation, open Command Prompt (CMD) or PowerShell and type git
to verify the installation.
Create a New Repository on GitHub
Go to GitHub and log in to your account.
Click on the "+" icon in the top right corner and select "New repository".
Name your repository and provide a description.
Click "Create repository".
Git Cheat Sheet for Common Commands
Refer to the Git cheat sheet for commonly used commands.
Initialize Git in Your Project
Create a new folder for your project.
Inside the folder, create a new file named
README.md
and add a description of your project.Open CMD or PowerShell in the project folder and initialize Git by typing:
git init
You might see a 'U' symbol, indicating that the file is untracked.
Track the file by typing:
git add README.md
Check the status of tracked files:
git status
Commit the changes:
git commit -m "Initial commit"
Rename your branch to
main
:git branch -M main
Verify the branch name:
git branch
Add the remote repository link:
git remote add origin <link-of-your-repo>
Verify the remote repository:
git remote -v
Push the changes to GitHub:
git push -u origin main
Enter your GitHub account details when prompted. You can also configure your Git username and email:
git config --global user.name "<yourname>" git config --global user.email "<youremail>"
Managing Changes
Whenever you make changes to a tracked file, you will see the "M" symbol, indicating that the file is modified. To push the changes, repeat the steps from git add
to git push
.
Pull Changes from GitHub
If you create a file (e.g., .gitignore
) directly on GitHub, you can pull the changes to your local repository:
Click on '+' icon to create a new file
type .gitignore
in the input
Click on the choose .gitignore template, type "python," and then click on "commit changes." This will create a .gitignore file in your GitHub repository. To see this file on your local machine, you need to pull the changes by typing the following command:
git pull origin main
.gitignore
The .gitignore
file specifies which files and directories to ignore in the Git repository. This is important to avoid pushing sensitive or unnecessary files, such as environment configurations or large datasets.
Adding conda environment files to .gitignore
Conda is an open-source package management and environment management system that allows you to create isolated environments for your projects. This is crucial for managing dependencies and ensuring reproducibility.
Create a new Conda environment:
conda create -p venv python -y
This command creates a new environment named
venv
with Python installed.If you have manually made the .gitignore file on your local machine, then, add the
venv
folder to your.gitignore
file to avoid pushing the environment to GitHub. This prevents unnecessary files from being stored in the repository.
Git Log
The git log
command shows the commit history of the repository. It is important for tracking changes and understanding the project's evolution.
Branching in GitHub
Branching allows you to work on different features or fixes in isolation. This is especially important when collaborating with others.
Create a new branch:
git branch <branch_name>
Switch to the new branch:
git checkout <branch_name>
Make changes and commit them to the new branch.
To merge branches:
git checkout main git merge <branch_name>
To delete a branch:
git branch -d <branch_name>
Resolving Conflicts
When multiple people work on the same codebase, conflicts can occur. This usually happens when two or more people make changes to the same file or lines of code in their respective branches. Git will notify you of these conflicts when you try to merge the branches. Here’s how to resolve conflicts:
Identify the Conflict: Git will mark the files with conflicts and show the conflicting changes. You can see the conflicts by running:
git status
Conflicted files will be listed as "both modified."
Open the Conflicted File: Open the file(s) with conflicts in your code editor. You will see sections of the file marked with conflict markers:
<<<<<<< HEAD Your changes here ======= Their changes here >>>>>>> branch-name
Edit the File: Resolve the conflict by editing the file. Decide which changes to keep or how to merge them. Remove the conflict markers (
<<<<<<<
,=======
, and>>>>>>>
) after resolving the conflict.Add the Resolved File: After resolving the conflicts, add the resolved file to the staging area:
git add <file_name>
Commit the Changes: Commit the resolved changes:
git commit -m "Resolved merge conflict in <file_name>"
By following these steps, you can effectively resolve conflicts that arise when multiple contributors make changes to the same parts of a codebase. This ensures that everyone’s contributions are integrated smoothly and the project continues to progress without issues.
Conclusion
In this blog, we've covered the essential steps to integrate Git and GitHub into your MLOps workflow. From installing Git and GitHub, initializing a repository, managing branches, and handling changes, you've learned the foundational skills needed for version control in your machine learning projects. These practices will help you collaborate effectively, maintain code integrity, and ensure reproducibility.
In the next blog, where we’ll learn about the fundamentals of Flask framework. Read the next blog here.