BigQuery

Overview

Solid requires specific permissions to crawl metadata and query history from your BigQuery environment.

Permissions Required

Project Access: To list projects under your account
Dataset & Table Access: To extract metadata (schemas, clustering, descriptions)
Job Access: To read query logs from completed jobs
[Optional] Read Access: To collect data samples (statistics/null counts)

IMPORTANT: You must have Owner or IAM Admin permissions on the relevant GCP project(s) to create these roles and service accounts.

Step 1: Create a Custom Role

To follow the principle of least privilege, create a custom role with only the necessary job-related permissions.

Instructions

In the GCP Console, go to IAM & Admin → Roles
Select your project at the top of the dashboard
Click + Create Role
Enter the following details:
- Title: Solid Data Role
- Role launch stage: General Availability
Click + Add Permissions and filter/add the following:
- bigquery.jobs.create
- bigquery.jobs.get
- bigquery.jobs.list
- bigquery.jobs.listAll
- bigquery.tables.getData (Optional: Only if data profiling is required)
Click Create

Step 2: Create a Service Account

Instructions

In the GCP Console, go to IAM & Admin → Service Accounts
Click + Create Service Account
Enter the following details:
- Name: Solid Data
- Click Create and Continue
Assign Roles: Add the following two roles:
- BigQuery Metadata Viewer (Predefined)
- Solid Data Role (The custom role you created in Step 1)
Click Done

Supporting Multi-Project Access

If your data is spread across multiple GCP projects, repeat the following for each project:

Go to the target project's IAM & Admin page
Click Grant Access
Enter the email address of the service account you just created
Assign the same two roles listed above:
- BigQuery Metadata Viewer
- Solid Data Role

Step 3: Create a JSON Key for Solid

In the Service Accounts list, click on the account you just created (Solid Data)
Go to the Keys tab
Click Add Key → Create new key
Select JSON and click Create
A JSON file will download automatically

Security Note: Keep this JSON file secure—you will upload it to the Solid platform. Do not share it or commit it to version control.

Step 4: Configure Solid

Once you have the JSON key:

Log into the Solid platform
Navigate to Settings → Integrations → BigQuery
Upload the JSON Key file
Click Test Connection
- If the connection fails, verify that:
  - The service account has the correct roles assigned
  - The JSON key file is valid and not corrupted
  - All required permissions are granted
Once successful, select the Datasets you wish to monitor
Click Save

Solid will begin indexing your BigQuery metadata.

What Data Does Solid Collect?

For transparency, here are the specific queries and data sources Solid uses when collecting information from BigQuery.

Note: BigQuery calculates usage and identifies users differently than Snowflake by parsing query logs rather than using aggregated DML history tables.

Metadata & Lineage

Solid joins COLUMNS, TABLES, and TABLE_OPTIONS to fetch descriptions and clustering information:

SELECT 
    t1.*, 
    t2.*, 
    t3.option_value AS TABLE_COMMENT, 
    t4.description AS COLUMN_COMMENT
FROM 
    `project.region.INFORMATION_SCHEMA.COLUMNS` t1
LEFT JOIN `project.region.INFORMATION_SCHEMA.TABLES` t2 
    ON t1.table_name = t2.table_name
LEFT JOIN `project.region.INFORMATION_SCHEMA.TABLE_OPTIONS` t3 
    ON t1.table_name = t3.table_name 
    AND t3.option_name = 'description'
WHERE 
    t1.table_schema IN ('your_datasets');

Usage & User Tracking

Unlike Snowflake, BigQuery does not have an aggregated DML history table. Instead, Solid identifies usage and active users by parsing the Query Logs from INFORMATION_SCHEMA.JOBS_BY_PROJECT.

Metric	Logic
Usage	Aggregates `inserted_row_count`, `deleted_row_count`, etc., from `JOBS_BY_PROJECT` where statement type is `INSERT`, `UPDATE`, `DELETE`, or `MERGE`
Users	Extracts `user_email` and `last_success_login` from the query history to identify active data consumers

Query History

Solid collects job information from the INFORMATION_SCHEMA.JOBS_BY_PROJECT view to understand query patterns and data access:

SELECT 
    job_id,
    creation_time,
    start_time,
    end_time,
    user_email,
    statement_type,
    query,
    total_bytes_processed,
    total_slot_ms,
    state,
    error_result
FROM 
    `project.region.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
WHERE 
    creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 365 DAY)
    AND state = 'DONE'
    AND statement_type IN ('SELECT', 'INSERT', 'UPDATE', 'DELETE', 'MERGE')
ORDER BY 
    creation_time DESC;

Data Profiling (Optional)

Note: This only applies if you granted the bigquery.tables.getData permission in Step 1.

Solid samples up to 100,000 rows per table to profile your data:

Numeric Columns

SELECT 
    'project:dataset.table.column' AS column_name,
    'NUMERIC' AS data_type,
    MIN(column) AS min_value,
    MAX(column) AS max_value,
    AVG(column) AS avg_value,
    COUNT(*) AS row_count,
    COUNT(DISTINCT column) AS distinct_count,
    COUNTIF(column IS NULL) AS null_count
FROM 
    `project.dataset.table`
LIMIT 100000;

Text Columns

SELECT 
    'project:dataset.table.column' AS column_name,
    'TEXT' AS data_type,
    COUNT(*) AS row_count,
    COUNT(DISTINCT column) AS distinct_count,
    COUNTIF(column IS NULL) AS null_count,
    STRING_AGG(DISTINCT column, '|||' ORDER BY column LIMIT 100) AS text_distinct_values
FROM 
    `project.dataset.table`
WHERE 
    column IS NOT NULL
LIMIT 100000;

Date Columns

SELECT 
    'project:dataset.table.column' AS column_name,
    'DATE' AS data_type,
    COUNT(*) AS row_count,
    COUNT(DISTINCT column) AS distinct_count,
    COUNTIF(column IS NULL) AS null_count,
    MIN(column) AS date_min_value,
    MAX(column) AS date_max_value
FROM 
    `project.dataset.table`
LIMIT 100000;

Troubleshooting

Connection Issues

If you're having trouble connecting:

Verify the JSON key file is valid and properly formatted
Check service account permissions by going to IAM & Admin → IAM and confirming the service account has the required roles
Ensure the custom role was created with all necessary permissions
Verify project access if working across multiple projects

Permission Issues

If Solid can't access certain datasets or tables:

Check the service account roles in each project that contains data you want to monitor
Verify dataset permissions by ensuring the service account has access to specific datasets
Confirm the custom role includes all required permissions listed in Step 1

Data Profiling Not Working

If data profiling is failing:

Verify you granted the bigquery.tables.getData permission in the custom role
Check table-level permissions to ensure the service account can read data
Review BigQuery quotas to ensure you're not hitting API limits

Security Best Practices

Store the JSON key securely and never commit it to version control
Rotate service account keys regularly (recommended every 365 days)
Use the principle of least privilege by only granting bigquery.tables.getData if data profiling is absolutely necessary
Monitor service account activity through GCP's audit logs
Restrict service account access to only the projects and datasets that need to be monitored