Google Big Query

Optionally, for loading data from files into BigQuery, the gcloud_gcs_bucket_name can be specified in the database initialization. This will use the Google Cloud Storage bucket specified as cache for loading data and over-coming potential limitations. For more see loading-data. By default, files will directly loaded locally as described in loading-local-data.

Installation

Use extras bigquery to install all required packages.

$ pip install mara-db[bigquery]

The official bq and gcloud clients are required. See the Google Cloud SDK page for installation details.

Enabling the BigQuery API and Service account JSON credentials are also required as listed in the official documentation here.

One time authentication of the service-account used:

$ gcloud auth activate-service-account --key-file='path-to/service-account.json'

To read from STDIN an additional Google Cloud Storage bucket is required as temp storage.

Configuration examples

import mara_db.dbs
mara_db.config.databases = lambda: {
    'dwh': mara_db.dbs.BigQueryDB(
        service_account_json_file_name='service-account.json',
        location='EU',
        project='my-project-name',
        dataset='dwh'),
}


API reference

This section contains database specific API in the module.

Configuration

class mara_db.dbs.BigQueryDB(service_account_json_file_name: str, location: Optional[str] = None, project: Optional[str] = None, dataset: Optional[str] = None, gcloud_gcs_bucket_name=None, use_legacy_sql: bool = False)
__init__(service_account_json_file_name: str, location: Optional[str] = None, project: Optional[str] = None, dataset: Optional[str] = None, gcloud_gcs_bucket_name=None, use_legacy_sql: bool = False)

Connection information for a BigQueryDB database

Enabling the BigQuery API and Service account json credentials are required. For more: https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries#before-you-begin

Parameters
  • service_account_json_file_name – The name of the private key file provided by Google when creating a service account (in json format)

  • location – Default geographic location to use when creating datasets or determining where jobs should run

  • project – Default project to use for requests.

  • dataset – Default dataset to use for requests.

  • gcloud_gcs_bucket_name – The Google Cloud Storage bucked used as cache for loading data

  • use_legacy_sql – (default: false) If true, use the old BigQuery SQL dialect is used.

property sqlalchemy_url

Returns the SQLAlchemy url for a database

General helper functions

Data modelling helper functions