Google Big Query¶
Optionally, for loading data from files into BigQuery, the gcloud_gcs_bucket_name can be specified in the database initialization. This will use the Google Cloud Storage bucket specified as cache for loading data and over-coming potential limitations. For more see loading-data. By default, files will directly loaded locally as described in loading-local-data.
Installation¶
Use extras bigquery to install all required packages.
$ pip install mara-db[bigquery]
The official bq and gcloud clients are required. See the Google Cloud SDK page for installation details.
Enabling the BigQuery API and Service account JSON credentials are also required as listed in the official documentation here.
One time authentication of the service-account used:
$ gcloud auth activate-service-account --key-file='path-to/service-account.json'
To read from STDIN an additional Google Cloud Storage bucket is required as temp storage.
Configuration examples¶
import mara_db.dbs
mara_db.config.databases = lambda: {
'dwh': mara_db.dbs.BigQueryDB(
service_account_json_file_name='service-account.json',
location='EU',
project='my-project-name',
dataset='dwh'),
}
import mara_db.dbs
mara_db.config.databases = lambda: {
'dwh': mara_db.dbs.BigQueryDB(
service_account_json_file_name='service-account.json',
location='EU',
project='my-project-name',
dataset='dwh',
gcloud_gcs_bucket_name='my-temp-bucket'),
}
API reference¶
This section contains database specific API in the module.
Configuration¶
- class mara_db.dbs.BigQueryDB(service_account_json_file_name: str, location: Optional[str] = None, project: Optional[str] = None, dataset: Optional[str] = None, gcloud_gcs_bucket_name=None, use_legacy_sql: bool = False)¶
- __init__(service_account_json_file_name: str, location: Optional[str] = None, project: Optional[str] = None, dataset: Optional[str] = None, gcloud_gcs_bucket_name=None, use_legacy_sql: bool = False)¶
Connection information for a BigQueryDB database
Enabling the BigQuery API and Service account json credentials are required. For more: https://cloud.google.com/bigquery/docs/quickstarts/quickstart-client-libraries#before-you-begin
- Parameters
service_account_json_file_name – The name of the private key file provided by Google when creating a service account (in json format)
location – Default geographic location to use when creating datasets or determining where jobs should run
project – Default project to use for requests.
dataset – Default dataset to use for requests.
gcloud_gcs_bucket_name – The Google Cloud Storage bucked used as cache for loading data
use_legacy_sql – (default: false) If true, use the old BigQuery SQL dialect is used.
- property sqlalchemy_url¶
Returns the SQLAlchemy url for a database