Skip to content

Illuminate

Code Style: Black Coveralls Coverage GitHub Actions - Tests Workflow
Bugs Code Smells Duplicated Lines (%) Quality Gate Status


Data is like garbage. You’d better know what you are going to do with it before you collect it.

— Mark Twain

This code aims to be a thin, "batteries included", ETL framework. It is written using prominent Python frameworks such as Alembic, Click, Tornado and SQLAlchemy. Driver behind this project was a need for a rapid ETL and Scraping capabilities framework that is both development and deployment friendly, as well as something to return to the community. The whole idea is heavily influenced by django and Scrapy. Tested with pytest with a help of tox.

Installation

Package is provided by PyPI. For this occasion, we will create an example project, simply called "tutorial", that will use an example from project files. In your shell, type the following:

mkdir tutorial
cd tutorial
python3 -m venv venv
source venv/bin/activate
pip install beautifulsoup4 illuminated

NOTE: Package name on PyPI is illuminated. Illuminate is not dependent on beautifulsoup4, but this example is.

If installation is successful, you can verify by typing:

illuminate --version

NOTE: From version 0.3.5 on, the required SQLAlchemy version is 2.0.37 and above.

Project Setup

Once you have CLI ready to create a project structure in the current directory, type the following:

export ILLUMINATE_PGADMIN_PASSWORD=<PGADMIN_PASSWORD>
export ILLUMINATE_GRAFANA_PASSWORD=<GRAFANA_PASSWORD>
export ILLUMINATE_MAIN_DB_PASSWORD=<DB_PASSWORD>
export ILLUMINATE_MEASUREMENTS_DB_PASSWORD=<MEASUREMENTS_DB_PASSWORD>
illuminate manage project setup tutorial .

This will create a complete project structure with all the files and ENV vars needed to run the example ETL flow.

Use the provided docker-compose.yaml file and bring the environment up.

docker-compose up -d

Once postgres and pgadmin containers are ready, you should perform database migration by creating a revision file and use it to upgrade the database, thus creating a table representing ExampleModel provided with the project files.

illuminate manage db revision
illuminate manage db upgrade

This will create a table in the database that will be used as a Load destination for our example.

Execution

Now everything is set for Illuminate to start observing.

illuminate observe start

Docker distribution

Illuminate is provided as containerized distribution as well:

docker pull nikolamilojica/illuminate:latest

To use docker distribution while inside a project directory, type the following:

docker run -it --rm --network=host \
   -e ILLUMINATE_MAIN_DB_PASSWORD=<DB_PASSWORD> \
   -e ILLUMINATE_MEASUREMENTS_DB_PASSWORD=<MEASUREMENTS_DB_PASSWORD> \
   -v $(pwd):/root/illuminate \
   nikolamilojica/illuminate illuminate observe start