Illuminate
Data is like garbage. You’d better know what you are going to do with it before you collect it.
— Mark Twain
This code aims to be a thin, "batteries included", ETL framework. It is written using prominent Python frameworks such as Alembic, Click, Tornado and SQLAlchemy. Driver behind this project was a need for a rapid ETL and Scraping capabilities framework that is both development and deployment friendly, as well as something to return to the community. The whole idea is heavily influenced by django and Scrapy. Tested with pytest with a help of tox.
Installation
Package is provided by PyPI. For this occasion, we will create an example project, simply called "tutorial", that will use an example from project files. In your shell, type the following:
mkdir tutorial
cd tutorial
python3 -m venv venv
source venv/bin/activate
pip install beautifulsoup4 illuminated
NOTE: Package name on PyPI is
illuminated
. Illuminate is not dependent onbeautifulsoup4
, but this example is.
If installation is successful, you can verify by typing:
NOTE: From version
0.3.5
on, the required SQLAlchemy version is2.0.37
and above.
Project Setup
Once you have CLI ready to create a project structure in the current directory, type the following:
export ILLUMINATE_PGADMIN_PASSWORD=<PGADMIN_PASSWORD>
export ILLUMINATE_GRAFANA_PASSWORD=<GRAFANA_PASSWORD>
export ILLUMINATE_MAIN_DB_PASSWORD=<DB_PASSWORD>
export ILLUMINATE_MEASUREMENTS_DB_PASSWORD=<MEASUREMENTS_DB_PASSWORD>
illuminate manage project setup tutorial .
This will create a complete project structure with all the files and ENV vars needed to run the example ETL flow.
Use the provided docker-compose.yaml file and bring the environment up.
Once postgres and pgadmin containers are
ready, you should perform database migration by creating a revision file and
use it to upgrade the database, thus creating a table representing
ExampleModel
provided with the project files.
This will create a table in the database that will be used as a Load destination for our example.
Execution
Now everything is set for Illuminate to start observing.
Docker distribution
Illuminate is provided as containerized distribution as well:
To use docker distribution while inside a project directory, type the following: