Git and GitHub Essentials

Git and GitHub Essentials

Introduction to the MLOps Blog Series

Hey there! Welcome to the MLOps blog series. In this series, my goal is to give you a solid understanding of MLOps and the tools you need to simplify and automate your machine learning processes. We’ll cover a variety of topics, starting with the basics of version control using Git and GitHub, moving through managing environments, continuous integration and deployment, and ending with monitoring and maintaining models in production. Each blog post will build on the previous one, guiding you step-by-step through MLOps.

Getting Started

In this first blog, we’ll start with Git and GitHub to introduce you to the basics of version control, laying the groundwork for more advanced topics. By the end of this series, you’ll have a strong understanding of how to implement effective MLOps practices in your projects.

What is MLOps ?

MLOps, or Machine Learning Operations, is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. It combines Machine Learning (ML) with DevOps principles to automate and streamline the entire ML lifecycle, from data preparation and model training to deployment and monitoring. MLOps ensures that machine learning models are continuously integrated, tested, and deployed in a reproducible manner, making it easier to manage and scale ML workflows.

Why Git and GitHub as the First Step ?

Git is a version control system that helps you track changes in your code, collaborate with others, and manage different versions of your projects. GitHub is a web-based platform that provides hosting for software development and version control using Git. Starting with Git and GitHub is crucial for any MLOps pipeline because it allows you to:

  • Version Control: Keep track of changes to your code and data.

  • Collaboration: Work seamlessly with other team members.

  • Backup: Store your code and data safely in the cloud.

  • Continuous Integration: Automate testing and deployment of your models.

Install Git & GitHub

  1. Install Git: Visit the official Git website and download the appropriate version for your operating system. Follow the installation instructions.

  2. Install GitHub: You can use GitHub through your web browser, or install GitHub Desktop from the GitHub Desktop website for a more integrated experience.

After installation, open Command Prompt (CMD) or PowerShell and type git to verify the installation.

Create a New Repository on GitHub

  1. Go to GitHub and log in to your account.

  2. Click on the "+" icon in the top right corner and select "New repository".

  3. Name your repository and provide a description.

  4. Click "Create repository".

Git Cheat Sheet for Common Commands

Refer to the Git cheat sheet for commonly used commands.

Initialize Git in Your Project

  1. Create a new folder for your project.

  2. Inside the folder, create a new file named README.md and add a description of your project.

  3. Open CMD or PowerShell in the project folder and initialize Git by typing:

     git init
    

    You might see a 'U' symbol, indicating that the file is untracked.

  4. Track the file by typing:

     git add README.md
    
  5. Check the status of tracked files:

     git status
    
  6. Commit the changes:

     git commit -m "Initial commit"
    
  7. Rename your branch to main:

     git branch -M main
    
  8. Verify the branch name:

     git branch
    
  9. Add the remote repository link:

     git remote add origin <link-of-your-repo>
    
  10. Verify the remote repository:

    git remote -v
    
  11. Push the changes to GitHub:

    git push -u origin main
    
  12. Enter your GitHub account details when prompted. You can also configure your Git username and email:

    git config --global user.name "<yourname>"
    git config --global user.email "<youremail>"
    

Managing Changes

Whenever you make changes to a tracked file, you will see the "M" symbol, indicating that the file is modified. To push the changes, repeat the steps from git add to git push.

Pull Changes from GitHub

If you create a file (e.g., .gitignore) directly on GitHub, you can pull the changes to your local repository:

Click on the '+' icon to create a new file

Click on '+' icon to create a new file

type .gitignore in the input

Click on the choose .gitignore template, type "python," and then click on "commit changes." This will create a .gitignore file in your GitHub repository. To see this file on your local machine, you need to pull the changes by typing the following command:

git pull origin main

.gitignore

The .gitignore file specifies which files and directories to ignore in the Git repository. This is important to avoid pushing sensitive or unnecessary files, such as environment configurations or large datasets.

Adding conda environment files to .gitignore

Conda is an open-source package management and environment management system that allows you to create isolated environments for your projects. This is crucial for managing dependencies and ensuring reproducibility.

  1. Create a new Conda environment:

     conda create -p venv python -y
    

    This command creates a new environment named venv with Python installed.

  2. If you have manually made the .gitignore file on your local machine, then, add the venv folder to your .gitignore file to avoid pushing the environment to GitHub. This prevents unnecessary files from being stored in the repository.

Git Log

The git log command shows the commit history of the repository. It is important for tracking changes and understanding the project's evolution.

Branching in GitHub

Branching allows you to work on different features or fixes in isolation. This is especially important when collaborating with others.

  1. Create a new branch:

     git branch <branch_name>
    
  2. Switch to the new branch:

     git checkout <branch_name>
    
  3. Make changes and commit them to the new branch.

  4. To merge branches:

     git checkout main
     git merge <branch_name>
    
  5. To delete a branch:

     git branch -d <branch_name>
    

Resolving Conflicts

When multiple people work on the same codebase, conflicts can occur. This usually happens when two or more people make changes to the same file or lines of code in their respective branches. Git will notify you of these conflicts when you try to merge the branches. Here’s how to resolve conflicts:

  1. Identify the Conflict: Git will mark the files with conflicts and show the conflicting changes. You can see the conflicts by running:

     git status
    

    Conflicted files will be listed as "both modified."

  2. Open the Conflicted File: Open the file(s) with conflicts in your code editor. You will see sections of the file marked with conflict markers:

     <<<<<<< HEAD
     Your changes here
     =======
     Their changes here
     >>>>>>> branch-name
    
  3. Edit the File: Resolve the conflict by editing the file. Decide which changes to keep or how to merge them. Remove the conflict markers (<<<<<<<, =======, and >>>>>>>) after resolving the conflict.

  4. Add the Resolved File: After resolving the conflicts, add the resolved file to the staging area:

     git add <file_name>
    
  5. Commit the Changes: Commit the resolved changes:

     git commit -m "Resolved merge conflict in <file_name>"
    

By following these steps, you can effectively resolve conflicts that arise when multiple contributors make changes to the same parts of a codebase. This ensures that everyone’s contributions are integrated smoothly and the project continues to progress without issues.

Conclusion

In this blog, we've covered the essential steps to integrate Git and GitHub into your MLOps workflow. From installing Git and GitHub, initializing a repository, managing branches, and handling changes, you've learned the foundational skills needed for version control in your machine learning projects. These practices will help you collaborate effectively, maintain code integrity, and ensure reproducibility.

In the next blog, where we’ll learn about the fundamentals of Flask framework. Read the next blog here.

Did you find this article valuable?

Support Kanishk's Kaleidoscope by becoming a sponsor. Any amount is appreciated!