Prompt Title: ds

Created 1 year ago
Votes 0
Views 256 times
0

Prompt Detail:

Here are the solutions to each task.

Task 1: Create a SQL query to create the table with the specified schema:

sql
CREATE TABLE project_id.dataset_name.table_name ( rental_id INT64, duration INT64, end_date TIMESTAMP, end_station_id INT64, end_station_name STRING, start_date TIMESTAMP, start_station_id INT64, start_station_name STRING, end_station_logical_terminal INT64, start_station_logical_terminal INT64 );

Task 2: Use the Airflow BigQueryOperator to create the table:

python
from airflow import DAG from airflow.providers.google.cloud.operators.bigquery import BigQueryOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), } dag = DAG( 'bigquery_table_creation', default_args=default_args, description='Create BigQuery table', schedule_interval=timedelta(days=1), start_date=datetime(2023, 3, 14), catchup=False, ) create_table = BigQueryOperator( task_id='create_table', sql='''CREATE TABLE project_id.dataset_name.table_name ( rental_id INT64, duration INT64, end_date TIMESTAMP, end_station_id INT64, end_station_name STRING, start_date TIMESTAMP, start_station_id INT64, start_station_name STRING, end_station_logical_terminal INT64, start_station_logical_terminal INT64 );''', use_legacy_sql=False, dag=dag, ) create_table

Task 3: Count the daily number of trips in 2016-2017 and the cumulative number of trips:

sql
WITH daily_trips AS ( SELECT DATE(start_date) as trip_date, COUNT(rental_id) as num_trips FROM bigquery-public-data.london_bicycles.cycle_hire WHERE EXTRACT(YEAR FROM start_date) BETWEEN 2016 AND 2017 GROUP BY trip_date ), cumulative_trips AS ( SELECT trip_date, SUM(num_trips) OVER (ORDER BY trip_date) as cumulative_trips FROM daily_trips ) SELECT trip_date, num_trips, cumulative_trips FROM cumulative_trips ORDER BY trip_date

Task 4: Count the number of trips by start station:

sql
SELECT start_station_name, COUNT(rental_id) as num_trips FROM bigquery-public-data.london_bicycles.cycle_hire GROUP BY start_station_name ORDER BY num_trips DESC

Task 5: Identify a station with the longest average trip duration using the bigquery-public-data.new_york.tlc_yellow_trips_2015 table:

sql
SELECT start_station_name, AVG(duration) as avg_duration FROM bigquery-public-data.london_bicycles.cycle_hire GROUP BY start_station_name ORDER BY avg_duration DESC LIMIT 1

Please note that Task 5 mentions the bigquery-public-data.new_york.tlc_yellow_trips_2015 table, but it seems unrelated to the problem. I assumed you meant to use the bigquery-public-data.london_bicycles.cycle_hire table for this task.

avatar
DS
Shared 3 prompts
Created 1 year ago

Leave a Comment