Tutorial
In this tutorial, you will learn how to set up a Data7 project using the
Chinook database. Your project
will be named open-chinook
.
Install Data7
Data7 is a pure Python package hosted in PyPI. It can be installed using your favorite package manager.
Once installed, it comes with a
CLI. As long as the
installation directory is in your PATH
, the data7
command can be invoked as
follow:
Expected command usage output should look like the following:
Usage: data7 [OPTIONS] COMMAND [ARGS]...
╭─ Options ───────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to │
│ copy it or customize the installation. │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────╮
│ check Check data7 project configuration. │
│ init Initialize a data7 project. │
│ run Run data7 web server. │
╰─────────────────────────────────────────────────────────────────────────╯
Download the Chinook database
As we need data to expose in this tutorial, we will download the excellent Chinook database in SQLite format using your favorite tool to do so:
Make sure downloaded databases exists and is not empty (expected size should be approximatively 1Mo):
Chinook database schema is available from the project's repository:
Initialize the project
Data7 requires few configuration files to be created before firing up the web
server. Those can be created using the data7 init
command.
Three configuration files should have been created:
settings.yaml
: general application settingsdata7.yaml
: exposed datasets configuration.secrets.yaml
: private credentials such as the database connection URL
#
# Data7 application general settings
#
# ---- GLOBAL ----------------------------------
global:
# The base url path for dataset urls
datasets_root_url: "/d"
# Pandas chunks
chunk_size: 5000
schema_sniffer_size: 1000
default_dtype_backend: pyarrow
# Pyinstrument
profiler_interval: 0.001
profiler_async_mode: enabled
# ---- DEFAULT ---------------------------------
default:
# Set debug to true for development, never for production!
debug: false
# Server
# host:
# port:
# Sentry
sentry_dsn: null
sentry_traces_sample_rate: 1.0
# Pyinstrument
profiling: false
# ---- PRODUCTION ------------------------------
production:
execution_environment: production
# Set debug to true for development, never for production!
debug: false
# Server
# host: data7.example.com
# port: 8080
# Sentry
# sentry_dsn:
# sentry_traces_sample_rate: 1.0
# Pyinstrument
profiling: false
#
# /!\ FEEL FREE TO REMOVE ENVIRONMENTS BELOW /!\
#
# ---- DEVELOPMENT -----------------------------
development:
execution_environment: development
debug: true
# Server
host: "127.0.0.1"
port: 8000
# Pyinstrument
profiling: true
# ---- TESTING ---------------------------------
testing:
execution_environment: testing
#
# Data7 secrets.
#
# Feel free to adapt this file given your needs and environment.
#
# ---- DEFAULT ---------------------------------
default:
# DATABASE_URL: "sqlite:///example.db"
# ---- PRODUCTION ------------------------------
production:
# DATABASE_URL: "sqlite:///example.db"
#
# /!\ FEEL FREE TO REMOVE ENVIRONMENTS BELOW /!\
#
# ---- DEVELOPMENT -----------------------------
development:
DATABASE_URL: "sqlite:///db/development.db"
# ---- TESTING ---------------------------------
testing:
DATABASE_URL: "sqlite:///db/tests.db"
#
# Data7 datasets definition.
#
# Feel free to adapt this file given your needs and environment.
#
# ---- DEFAULT ---------------------------------
default:
datasets: []
# ---- PRODUCTION ------------------------------
production:
datasets: []
#
# /!\ FEEL FREE TO REMOVE ENVIRONMENTS BELOW /!\
#
# ---- DEVELOPMENT -----------------------------
development:
datasets:
- basename: invoices
query: "SELECT * FROM Invoice"
- basename: tracks
query: >-
SELECT Artist.Name as artist, Album.Title as title, Track.Name as track
FROM Artist
INNER JOIN Album ON Artist.ArtistId = Album.ArtistId
INNER JOIN Track ON Album.AlbumId = Track.AlbumId
ORDER BY Artist.Name, Album.Title
# ---- TESTING ---------------------------------
testing:
datasets:
- basename: dummy
query: "SELECT 1"
Tip
As you may have noticed, Data7 supports configuring your application for
multiple environments (e.g. development, testing, staging, production, etc.).
In the following example, we will set up our environment as development
by
creating a .env
file at the project root as follow:
Configure Data7
Before running our application, we need to set/check default example settings
for the development environment. First make sure that the development
DATABASE_URL
points to downloaded Chinook database in the .secrets.yaml
file:
And check what datasets are defined in the data7.yaml
file:
# data7.yaml
development:
datasets:
- basename: invoices
query: "SELECT * FROM Invoice"
- basename: tracks
query: >-
SELECT Artist.Name as artist, Album.Title as title, Track.Name as track
FROM Artist
INNER JOIN Album ON Artist.ArtistId = Album.ArtistId
INNER JOIN Track ON Album.AlbumId = Track.AlbumId
ORDER BY Artist.Name, Album.Title
With this configuration, we expect to expose two datasets (invoices
and
tracks
). The invoices
dataset is a pretty simple SQL table export while the
tracks
dataset corresponds to a more complex query involving multiple SQL
tables.
Warning
The default configuration (settings.yaml
) used to develop the Data7 project has
been left untouched as it is perfectly fine for this tutorial. For a production
environment, we invite you to properly set the host
and port
definitions.
Validate configuration
To verify that your configuration respects project's requirements, Data7 comes
with a data7 check
command. It can be handy to debug common issues you may
have while setting up your project.
If things are not properly configured, the command output will let you know what seems buggy.
Run the server
Now that everything is perfectly configured, you can run the Data7 server using
the run
command:
And 💥 your Data7 server is running at: localhost:8000
Tip
The development server listen to the 8000
port, if it's already in use, you
may want to change the port
setting (for the development
environment) in
the settings.yaml
file or run the above command using another port:
If you are curious and test the root URL of your server, it should respond with
a 404 HTTP code (not found). This is perfectly expected as Data7 only serves
configured datasets following this pattern:
/d/<dataset basename>.<dataset extension>
. For the invoices
dataset (see
configuration above), it means that the two following endpoints should respond
properly:
You can test those using your favorite tool:
The following command should display a CSV file in your terminal:
Or download it in Parquet format:
Question
As you may have noticed, we've also defined a tracks
dataset. We invite you
to also test it following the previously defined URL pattern. And if you are
curious, explore the Chinook database schema and be creative: create new
datasets using more complex SQL queries. 💪
Wrap up
In this tutorial, you have learned to create a new Data7 project from scratch. The next step for real life usage is to connect Data7 with your production database (PostgreSQL, MariaDB, etc.) and define your own datasets.
Keep it up!