Project setup with setup.py

Introduction

Welcome to the third blog in my MLOps series. In the previous blogs, we covered the essentials of Git and GitHub and how to use Flask for building web applications. In this blog, we'll go a step further by setting up a complete project environment. We'll cover everything from creating a virtual environment to setting up a GitHub repository, creating essential files, and configuring our project for development. This setup will provide a solid foundation for your MLOps workflow, ensuring your project is well-organized and easy to manage.

Step 1: Create an Environment

First, we need to create a virtual environment for our project. (If you have already created the enviorment by following the previous blog, just activate that one)

Creating the Environment

Use the following command to create a new conda environment named venv with the specified Python version:

conda create -p ./venv python=3.9

Here, -p specifies the path where the environment will be created. Replace 3.9 with your preferred Python version.

Activating the Environment

Once the environment is created, activate it with:

conda activate ./venv

Your command prompt should now show the venv environment, indicating it's active. This means that all the packages you install now will be specific to this environment.

Step 2: Set Up a GitHub Repository

Initialize the Repository

Navigate to your project directory and initialize a new Git repository:

git init

Create a README File

Create a README.md file in the main folder (not in venv):

echo "# My ML Project" > README.md

Add and Commit the README File

Add the README.md file to the repository and commit it:

git add README.md
git commit -m "Initial commit with README"

Create a .gitignore File

To ensure that certain files and directories are not tracked by Git, create a .gitignore file on GitHub and choose the Python template. This will ignore files and directories commonly used by Python projects, such as venv.

Pull Changes from GitHub

To apply the changes from the GitHub repository to your local machine:

git pull origin main

This ensures that your venv directory is not tracked by Git.

Step 3: Create a requirements.txt File

Listing Dependencies

List all the libraries your project needs in a requirements.txt file. This allows others to install the same dependencies easily.

echo "numpy\npandas\nscikit-learn" > requirements.txt

Installing Dependencies

To install the libraries listed in requirements.txt, run:

pip install -r requirements.txt

Step 4: Create setup.py

setup.py script is essential for packaging our project. It automates the process of package discovery and dependency management, making it easier to distribute and install the project. The get_requirements function ensures that all required libraries are listed and installed, while the setup function provides detailed metadata about the package. This organization is crucial for maintaining a well-structured and easily deployable project.

Creating setup.py

Create a setup.py file with the following content:

from setuptools import find_packages, setup
from typing import List

def get_requirements(file_path: str) -> List[str]:
    '''This function returns a list of requirements for the project.'''
    requirements = []
    with open(file_path) as file_obj:
        requirements = file_obj.readlines()
        requirements = [req.strip() for req in requirements]
    return requirements

setup(
    name='mlproject01',
    version='0.1.0',
    author='Kanishk Munot',
    author_email='abc@gmail.com',
    packages=find_packages(),
    install_requires=get_requirements('requirements.txt')
)

Imports

from setuptools import find_packages, setup
from typing import List

Explanation:

  • from setuptools import find_packages, setup: setuptools is a package development and distribution library in Python. The find_packages function automatically discovers all packages and sub-packages in your project, while the setup function is used to define your package's attributes and metadata.

  • from typing import List: typing is a module that provides support for type hints. Here, List is used to specify that the function get_requirements will return a list of strings.

get_requirements Function

def get_requirements(file_path: str) -> List[str]:
    '''This function is going to return list of requirements for my project'''

    requirements = []

    with open(file_path) as file_obj:
        requirements = file_obj.readlines()
        requirements = [req.replace('\n', "") for req in requirements]

    return requirements

Explanation:

  • def get_requirements(file_path: str) -> List[str]: This line defines a function named get_requirements that takes a file path as an argument and returns a list of strings. The type hints file_path: str and -> List[str] indicate that the input should be a string and the output will be a list of strings, respectively.

  • requirements = []: Initializes an empty list called requirements that will hold the package names.

  • with open(file_path) as file_obj: Opens the file specified by file_path and assigns it to the variable file_obj. The with statement ensures the file is properly closed after its suite finishes.

  • requirements = file_obj.readlines(): Reads all lines from the file and stores them in the requirements list. Each line represents a package name.

  • requirements = [req.replace('\n', "") for req in requirements]: This list comprehension iterates over each line in requirements and removes the newline character (\n) from each line.

  • return requirements: Returns the cleaned list of package names.

setup function

setup(
    name='mlproject01',
    version='0.1.0',
    author='kanishk',
    author_email='abc@gmail.com',
    packages=find_packages(),
    install_requires=get_requirements('requirements.txt')
)

Explanation:

  • setup(): This function is used to specify the metadata and options for your package.

  • name='mlproject01': Sets the name of your package.

  • version='0.1.0': Specifies the version of your package. Here, it is set to '0.1.0'. Versioning is crucial for package distribution and dependency management.

  • author='<your_name>': Provides the name of the package author.

  • author_email='<>': Provides the author's email address.

  • packages=find_packages(): Automatically discovers all packages and sub-packages in your project directory. This eliminates the need to manually specify each package.

  • install_requires=get_requirements('requirements.txt'): Calls the get_requirements function to read the requirements.txt file and returns a list of dependencies that need to be installed for this package.

Step 5: Create a Source Folder

To organize your code, create a folder named src and add an __init__.py file:

mkdir src
echo "" > src/__init__.py

Note: You can also manually create the src folder

Step 6: Install Your Package

Run the following command to install your package:

python setup.py

This command will install your package along with its dependencies, making it available for use in your environment.

Alternative Installation Method

You can also use the -e . option to install your package in editable mode. This is useful during development as it allows you to make changes to your code without reinstalling the package.

Modifying requirements.txt

Add -e . to requirements.txt:

echo "-e ." >> requirements.txt

Updating setup.py

Update setup.py to ignore -e .:

from setuptools import find_packages, setup
from typing import List

HYPEN_E_DOT = '-e .'

def get_requirements(file_path: str) -> List[str]:
    '''This function returns a list of requirements for the project.'''
    requirements = []
    with open(file_path) as file_obj:
        requirements = file_obj.readlines()
        requirements = [req.strip() for req in requirements]
        if HYPEN_E_DOT in requirements:
            requirements.remove(HYPEN_E_DOT)
    return requirements

setup(
    name='mlproject01',
    version='0.1.0',
    author='Kanishk Munot',
    author_email='abc@gmail.com',
    packages=find_packages(),
    install_requires=get_requirements('requirements.txt')
)

Explanation of the Change

The updated get_requirements function removes the -e . entry from the list of requirements, ensuring it doesn't cause issues during installation.

Final Steps

  1. Comment out the-e . entry in requirements.txt for now. We will use it in upcoming blogs.

  2. Commit all changes to the repository:

git add .
git commit -m "Set up project environment and configuration"
git push origin main

Conclusion

In this blog, we covered the essential steps to set up a robust project environment for your MLOps workflow, configuring setup.py to package our application. We value your feedback! Please share your thoughts and questions in the comments.

Read the next blog in the series on setting up template.pyhere.


Did you find this article valuable?

Support Kanishk's Kaleidoscope by becoming a sponsor. Any amount is appreciated!