Introduction
Welcome to the third blog in my MLOps series. In the previous blogs, we covered the essentials of Git and GitHub and how to use Flask for building web applications. In this blog, we'll go a step further by setting up a complete project environment. We'll cover everything from creating a virtual environment to setting up a GitHub repository, creating essential files, and configuring our project for development. This setup will provide a solid foundation for your MLOps workflow, ensuring your project is well-organized and easy to manage.
Step 1: Create an Environment
First, we need to create a virtual environment for our project. (If you have already created the enviorment by following the previous blog, just activate that one)
Creating the Environment
Use the following command to create a new conda environment named venv
with the specified Python version:
conda create -p ./venv python=3.9
Here, -p
specifies the path where the environment will be created. Replace 3.9
with your preferred Python version.
Activating the Environment
Once the environment is created, activate it with:
conda activate ./venv
Your command prompt should now show the venv
environment, indicating it's active. This means that all the packages you install now will be specific to this environment.
Step 2: Set Up a GitHub Repository
Initialize the Repository
Navigate to your project directory and initialize a new Git repository:
git init
Create a README File
Create a README.md
file in the main folder (not in venv
):
echo "# My ML Project" > README.md
Add and Commit the README File
Add the README.md
file to the repository and commit it:
git add README.md
git commit -m "Initial commit with README"
Create a .gitignore File
To ensure that certain files and directories are not tracked by Git, create a .gitignore
file on GitHub and choose the Python template. This will ignore files and directories commonly used by Python projects, such as venv
.
Pull Changes from GitHub
To apply the changes from the GitHub repository to your local machine:
git pull origin main
This ensures that your venv
directory is not tracked by Git.
Step 3: Create a requirements.txt File
Listing Dependencies
List all the libraries your project needs in a requirements.txt
file. This allows others to install the same dependencies easily.
echo "numpy\npandas\nscikit-learn" > requirements.txt
Installing Dependencies
To install the libraries listed in requirements.txt
, run:
pip install -r requirements.txt
Step 4: Create setup.py
setup.py
script is essential for packaging our project. It automates the process of package discovery and dependency management, making it easier to distribute and install the project. The get_requirements
function ensures that all required libraries are listed and installed, while the setup
function provides detailed metadata about the package. This organization is crucial for maintaining a well-structured and easily deployable project.
Creating setup.py
Create a setup.py
file with the following content:
from setuptools import find_packages, setup
from typing import List
def get_requirements(file_path: str) -> List[str]:
'''This function returns a list of requirements for the project.'''
requirements = []
with open(file_path) as file_obj:
requirements = file_obj.readlines()
requirements = [req.strip() for req in requirements]
return requirements
setup(
name='mlproject01',
version='0.1.0',
author='Kanishk Munot',
author_email='abc@gmail.com',
packages=find_packages(),
install_requires=get_requirements('requirements.txt')
)
Imports
from setuptools import find_packages, setup
from typing import List
Explanation:
from setuptools import find_packages, setup:
setuptools
is a package development and distribution library in Python. Thefind_packages
function automatically discovers all packages and sub-packages in your project, while thesetup
function is used to define your package's attributes and metadata.from typing import List:
typing
is a module that provides support for type hints. Here,List
is used to specify that the functionget_requirements
will return a list of strings.
get_requirements Function
def get_requirements(file_path: str) -> List[str]:
'''This function is going to return list of requirements for my project'''
requirements = []
with open(file_path) as file_obj:
requirements = file_obj.readlines()
requirements = [req.replace('\n', "") for req in requirements]
return requirements
Explanation:
def get_requirements(file_path: str) -> List[str]: This line defines a function named
get_requirements
that takes a file path as an argument and returns a list of strings. The type hintsfile_path: str
and-> List[str]
indicate that the input should be a string and the output will be a list of strings, respectively.requirements = []: Initializes an empty list called
requirements
that will hold the package names.with open(file_path) as file_obj: Opens the file specified by
file_path
and assigns it to the variablefile_obj
. Thewith
statement ensures the file is properly closed after its suite finishes.requirements = file_obj.readlines(): Reads all lines from the file and stores them in the
requirements
list. Each line represents a package name.requirements = [req.replace('\n', "") for req in requirements]: This list comprehension iterates over each line in
requirements
and removes the newline character (\n
) from each line.return requirements: Returns the cleaned list of package names.
setup function
setup(
name='mlproject01',
version='0.1.0',
author='kanishk',
author_email='abc@gmail.com',
packages=find_packages(),
install_requires=get_requirements('requirements.txt')
)
Explanation:
setup(): This function is used to specify the metadata and options for your package.
name='mlproject01': Sets the name of your package.
version='0.1.0': Specifies the version of your package. Here, it is set to '0.1.0'. Versioning is crucial for package distribution and dependency management.
author='<your_name>': Provides the name of the package author.
author_email='<abc@gmail.com>': Provides the author's email address.
packages=find_packages(): Automatically discovers all packages and sub-packages in your project directory. This eliminates the need to manually specify each package.
install_requires=get_requirements('requirements.txt'): Calls the
get_requirements
function to read therequirements.txt
file and returns a list of dependencies that need to be installed for this package.
Step 5: Create a Source Folder
To organize your code, create a folder named src
and add an __init__.py
file:
mkdir src
echo "" > src/__init__.py
Note: You can also manually create the src
folder
Step 6: Install Your Package
Run the following command to install your package:
python setup.py
This command will install your package along with its dependencies, making it available for use in your environment.
Alternative Installation Method
You can also use the -e .
option to install your package in editable mode. This is useful during development as it allows you to make changes to your code without reinstalling the package.
Modifying requirements.txt
Add -e .
to requirements.txt
:
echo "-e ." >> requirements.txt
Updating setup.py
Update setup.py
to ignore -e .
:
from setuptools import find_packages, setup
from typing import List
HYPEN_E_DOT = '-e .'
def get_requirements(file_path: str) -> List[str]:
'''This function returns a list of requirements for the project.'''
requirements = []
with open(file_path) as file_obj:
requirements = file_obj.readlines()
requirements = [req.strip() for req in requirements]
if HYPEN_E_DOT in requirements:
requirements.remove(HYPEN_E_DOT)
return requirements
setup(
name='mlproject01',
version='0.1.0',
author='Kanishk Munot',
author_email='abc@gmail.com',
packages=find_packages(),
install_requires=get_requirements('requirements.txt')
)
Explanation of the Change
The updated get_requirements
function removes the -e .
entry from the list of requirements, ensuring it doesn't cause issues during installation.
Final Steps
Comment out the
-e .
entry inrequirements.txt
for now. We will use it in upcoming blogs.Commit all changes to the repository:
git add .
git commit -m "Set up project environment and configuration"
git push origin main
Conclusion
In this blog, we covered the essential steps to set up a robust project environment for your MLOps workflow, configuring setup.py
to package our application. We value your feedback! Please share your thoughts and questions in the comments.
Read the next blog in the series on setting up template.py
here.