Table of Contents
even if you are not already familiar with SQL, it ’ s not rocket science. The teach curve is not that steep, and you can master SQL with some dedication and effort. In this tutorial, you will learn the fundamentals of SQL and BigQuery so you can get started with the proper basis .
introduction to Google BigQuery SQL
Besides the operation and scalability features, what makes BigQuery so democratic is its comfort of function. As Google BigQuery is using SQL as its question lyric, which is the standard question lyric for many democratic database and datum warehouse systems, database developers and analysts are already familiar with it. here is an example of an SQL question along with the reelect results, right from BigQuery :
SELECT first_name, last_name, city FROM `project.dataset.example` WHERE city LIKE '%West%'
Check out this SQL video tutorial for beginners made by Railsware Product Academy.
What types of SQL is BigQuery using ?
When BigQuery was first introduced, all executed queries were using a non-standard SQL dialect known as BigQuery SQL. After the fresh adaptation of BigQuery was released ( BigQuery 2.0 ), the Standard SQL was supported, and BigQuery SQL was renamed Legacy SQL. Currently, BigQuery supports both types ( or dialects ) of SQL : Standard SQL and Legacy SQL. You can use the dialect of your choice without any performance issues .
difference between Standard and Legacy SQL
The main difference between Standard SQL and Legacy SQL lies in their supported functions and question structure. furthermore, Standard SQL complies with the SQL 2011 standard, and has extensions that support querying nested and repeated data, provide data types to handle custom fields such as ARRAY and STRUCT, and allow building complex joins of different data. This is the chief reason that Google recommends Standard SQL as the question syntax .
Below is a quick example on how we can calculate the number of distinct users in both dialects. As you can see, while Legacy SQL requires a serve to calculate the result, Standard SQL is more compromising in terms of syntax .
# legacySQL
SELECT EXACT_COUNT_DISTINCT(user) FROM V;
#standardSQL
SELECT COUNT(DISTINCT user) FROM V;
How to switch to Standard SQL
For new projects, Google is using Standard SQL in Query settings by default, so you don ’ t have to do anything farther. For existing projects, you can use a question prefix on the question console to define the dialect. To do that, you can just write :
- #legacySQL – to run the query using Legacy SQL
- #standardSQL – to run the query using Standard SQL
In the Cloud Console, when you use a question prefix, the SQL dialect option is disabled in the Query settings. If you want to change the dialect using the Query settings, you can :
- Open BigQuery Console.
- Click “More” > “Query Settings”.
- Select the SQL dialect of preference.
What is BigQuery BI Engine ?
While BigQuery is big for repositing and business analysis purposes, there are cases where speed requirements are high gear, so the responses must be faster even than a couple of seconds. This is where the BigQuery BI Engine comes in. BigQuery BI Engine is a fast, in-memory analysis servicing with which you can analyze data stored in BigQuery with sub-second question answer meter and high concurrence .
In order to enable BI Engine for your BigQuery project, you can :
- Open BigQuery Admin Console BI Engine.
- Click “Create reservation”.
- Verify your project name, select your location, and adjust the slider so you can give the amount of memory needed for your calculations.
- Click “NEXT” and then “Create” to create your reservation for this project and enable the BI Engine.
You can then connect democratic BI Tools, for exercise, connect BigQuery to Tableau, Google Data Studio, Looker or Power BI to accelerate data exploration and analysis. Just like the BigQuery service overall, the BigQuery BI Engine comes with a free tier, but depending on use, you may have an extra monetary value to the BigQuery .
BigQuery SQL Syntax
In order to fetch data from BigQuery tables and analyze them you will have to write question statements in SQL to scan one or more tables and return the calculate result rows. “ What ’ s a question instruction ? ” you may ask. Let ’ s start by explaining the basic terminology around SQL :
- Query expression: a set of instructions that tell the database what action should perform.
- Query statement: one or many query expressions executed by the database system all together.
- Clause: a specific command that forms a query expression. Popular SQL clauses include
SELECT
,FROM
,WHERE
, andGROUP BY
. - Function: a predefined set of clauses that perform a complex action. Examples of SQL functions are
COUNT
,SUM
, and so on. - Operation: when a specific query statement is executed and the database system interacts with the data. The available operations a system can perform are
CREATE
,READ
,UPDATE
, andDELETE
. - Record: a set of information in structured data. If we visualize the data in a tabular format, a record can also be described as a row.
- Column: a piece of information that belongs to a record just like when we read the data in a tabular format.
Below we ’ ll testify you the question syntax for SQL queries in BigQuery. When writing a question construction, we basically write multiple clauses. The clauses must appear with the pursue sequence :
query_expr:
SELECT ...
FROM ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY...
In the follow sections, we ’ ll give everything on a sample dataset to have a better ocular theatrical performance of the question table along with the calculate result rows. If you do not have any data loaded to BigQuery even, you can use Coupler.io to import your data from unlike sources or Google Sheets in a promptly and easy room. Just export your data and, by following a truly elementary hang, you ’ ll have everything loaded from Google Sheets to BigQuery in a matter of minutes .
BigQuery SQL Syntax : SELECT list
The
SELECT
number defines the hardened of column that the question should return. The items within the
SELECT
number can refer to columns in any of the items within the
FROM
article. Each token in the
SELECT
list can be any of the under :
SELECT *
much referred to as the SELECT asterisk, this expression returns all of the correspond column from each detail of the
FROM
list. Let ’ s say we have a table called CLIENTS that contains the first name, surname, e-mail, and call count for each customer. After executing the question, these results are returned :
SELECT * FROM `CLIENTS`;
SELECT expression
All the items in the
SELECT
list can be expressions. These expressions can describe a single column and produce one column output. For exercise, using the same CLIENTS table :
SELECT first_name, email FROM `CLIENTS`;
BigQuery SQL Syntax : FROM article
The
FROM
article defines the source from which we ’ ll select the data. This can be a single table or multiple tables at once, but we have to besides define how these tables are joined together. We ’ ll talk about the
JOIN
operation in the next subsection. In the interim, below you can find a couple of ways to use the
FROM
clause if we want to query data from the CLIENTS mesa specifying the table if it ’ s unique, the dataset and the table if the table list is unique within the dataset and the project, or dataset and board if the postpone name is unique within the project :
SELECT * FROM `CLIENTS`; SELECT * FROM `dataset.CLIENTS`; SELECT * FROM `project.dataset.CLIENTS`;
Specifying the dataset or project is besides highly convenient when you are managing multiple projects and datasets and you need to be specific on which project or dataset to use .
BigQuery SQL Syntax : JOIN mathematical process
The
JOIN
operations are possibly one of the most significant operations in SQL as they can help you combine data from multiple tables or views. There are several types of joins and we ’ ll go through each one of them below :
INNER JOIN (or just JOIN)
INNER JOIN
or just
JOIN
is the operation that effectively calculates the cartesian product of the two tables based on a common value. basically,
JOIN
selects all records that have matching values in both tables. A ocular representation of the
INNER JOIN
is :
Let ’ s see how we can use
JOIN
to combine data from the below two tables :
Table A
a | b |
---|---|
1 | aaa |
2 | bbb |
3 | ccc |
Table B
c | d |
---|---|
1 | fff |
2 | ccc |
4 | ggg |
If we use the under question, we ’ ll end up with the concluding postpone containing all the data and column that plowshare the same value in column a ( from Table A ) and column c ( from Table B ) as we define with the ON parameter .
SELECT * FROM A INNER JOIN B ON A.a = B.c
a | b | c | d |
---|---|---|---|
1 | aaa | 1 | fff |
2 | bbb | 2 | ccc |
CROSS JOIN
Opposed to the
INNER JOIN
, the
CROSS JOIN
operation will return the cartesian product of the two tables careless of a coarse value. A ocular representation is :
We will use CROSS JOIN to combine data from the under two tables :
Table A
a | b |
---|---|
1 | aaa |
2 | bbb |
Table B
c | d |
---|---|
1 | fff |
3 | ccc |
We can use the below question and we ’ ll end up with the concluding table containing all the datum from both tables. As you can see, there is no ON parameter as the tables don ’ t need to share a common value .
SELECT * FROM A CROSS JOIN B
a | b | c | d |
---|---|---|---|
1 | aaa | 1 | fff |
1 | aaa | 3 | ccc |
2 | bbb | 1 | fff |
2 | bbb | 3 | ccc |
FULL JOIN (or FULL OUTER JOIN)
The
FULL JOIN
or
FULL OUTER JOIN
returns all records when there is a match in the left ( Table A ) or good ( Table B ) table records. normally,
FULL OUTER JOIN
can return very bombastic resultant role sets. This is like to the
CROSS JOIN
operation, except we have a coordinated condition to join rows ( if they exist ). Let ’ s see a ocular representation and the custom of
FULL OUTER JOIN
in natural process :
We will use
FULL OUTER JOIN
to combine data from the downstairs two tables :
Table A
a | b |
---|---|
1 | aaa |
2 | bbb |
Table B
c | d |
---|---|
1 | fff |
3 | ccc |
We can use the comply question and we ’ ll end up with the final examination board containing all the datum from both tables :
SELECT * FROM A FULL OUTER JOIN B ON A.a = B.c
a | b | c | d |
---|---|---|---|
1 | aaa | 1 | fff |
2 | bbb |
NULL
|
NULL
|
NULL
|
NULL
| 3 | ccc |
LEFT JOIN (or LEFT OUTER JOIN)
LEFT JOIN
or
LEFT OUTER JOIN
process always returns all the items from the leave detail in the
FROM
clause even if no rows in the right field detail satisfy the join predicate. This is different from the
FULL OUTER JOIN
as the rows that merely exist in the second postpone are not returned. Let ’ s see this type of join in legal action :
Using the same tables as earlier, we can calculate the LEFT JOIN like this :
Table A
a | b |
---|---|
1 | aaa |
2 | bbb |
Table B
c | d |
---|---|
1 | fff |
3 | ccc |
SELECT * FROM A LEFT OUTER JOIN B ON A.a = B.c
a | b | c | d |
---|---|---|---|
1 | aaa | 1 | fff |
2 | bbb |
NULL
|
NULL
|
RIGHT JOIN (or RIGHT OUTER JOIN)
exchangeable to the
LEFT JOIN,
the
RIGHT JOIN
(or
RIGHT OUTER JOIN
) mathematical process behaves the lapp way, merely it focuses on the veracious detail of the
FROM
list. Let ’ s see that in military action :
Calculating the
RIGHT JOIN
on the below tables returns these results :
Table A
a | b |
---|---|
1 | aaa |
2 | bbb |
Table B
c | d |
---|---|
1 | fff |
3 | ccc |
SELECT * FROM A RIGHT OUTER JOIN B ON A.a = B.c
a | b | c | d |
---|---|---|---|
1 | aaa | 1 | fff |
NULL
|
NULL
| 3 | ccc |
BigQuery SQL Syntax : WHERE article
The
WHERE
article is deoxyadenosine monophosphate simple as it is crucial for the queries. basically, this article filters the results of the
FROM
article. The syntax follows this structure :
SELECT * FROM `CLIENTS` WHERE bool_expression
lone rows whose
bool_expression
evaluates to
TRUE
are included. Rows whose
bool_expression
evaluates to
NULL
or
FALSE
are discarded. For model, let ’ s say we have this table :
We can use the
WHERE
clause to filter and keep only the people named “ Antonio ”. Let ’ s see this question :
SELECT * FROM `CLIENTS` WHERE first_name=”Antonio”
This will return only the rows that match this condition, so the result set will look like this :
BigQuery SQL Syntax : group BY article
When we ’ rhenium analyzing data, we normally perform calculations on them to get better insights. This is achieved by using collection functions like counting the results or summing some metrics. We ’ ll talk about collection functions in the take after subsection, but when we want to aggregate metrics, it ’ south important to define which column will be aggregated and which will not be .
The
GROUP BY
clause groups together rows in a postpone with non-distinct values for the expression in the
GROUP BY
clause. Let ’ s see an exemplar :
SELECT first_name, COUNT(company) AS number_of_companies FROM `CLIENTS` GROUP BY first_name
The above question will count all the unlike companies of a person with the same foremost mention works. This is achieved by defining that all purchases must be summed and grouped by every clear-cut survive name .
BigQuery SQL Syntax : accept article
The
HAVING
clause is exchangeable to the
WHERE
article, entirely it focuses on the results produced by the
GROUP BY
clause. If collection is present, the
HAVING
clause is evaluated once for every aggregate row in the result specify. For exemplar, if we want to return the issue of different companies any person named “ Antonio ” works we can use this question :
SELECT first_name, COUNT(company) AS number_of_companies FROM `CLIENTS` GROUP BY first_name HAVING first_name="Antonio"
BigQuery SQL Syntax : ordain BY article
Using the
ORDER BY
clause, we can define precisely how the result set will be ordered and the precedence. The
ORDER BY
clause comes at the end of each question to sort the final examination resultant role set. For example, if we want to sort the resultant role set shown in the previous subsection based on the First name in ascending arrange, we can do this :
SELECT first_name, COUNT(company) AS number_of_companies FROM `CLIENTS` GROUP BY first_name HAVING first_name="Antonio" ORDER BY first_name ASC
If the
ORDER BY
clause is not show, the order of the results of a question is not defined .
BigQuery SQL Syntax : LIMIT & OFFSET clauses
Another two great and useful clauses are
LIMIT
and
OFFSET
. Using
LIMIT
followed by a positive integer act, we can specify the number of rows that will be returned from the question.
LIMIT 0
returns 0 rows .
OFFSET
specified a non-negative number of rows to skip before applying
LIMIT
. These clauses accept alone literal or parameter values. Below is an exercise showing how we can return 10 rows after we skip the first 5 of the leave laid .
SELECT * FROM `CLIENTS` LIMIT 10 OFFSET 5;
BigQuery SQL Syntax : WITH clause
The
WITH
article is a great way to temporarily store a subquery and use it each prison term it is needed. A subquery is a complete question saying within a digression. This means that is evaluated first, and the results can be used by the question construction out of the digression .
The
WITH
clause contains one or more appoint subqueries that execute every time a subsequent
SELECT
instruction references them .
WITH SubQ1 AS ( SELECT first_name FROM CLIENTS ) SELECT * FROM subQ1
In the above question, we store a elementary question and we name it as “ subQ1 ”, and then we can reference this in a
FROM
clause barely like every other table .
BigQuery SQL Syntax : use aliases
A bang-up direction to optimize your code and make it more clear is by using aliases. An alias is a temp name given to a postpone, column, or construction present in a question. In the former subsection, we introduced a subquery called “ subQ1 ”. This is an alias that allows us to reference a whole question with a one mention. furthermore, tables can have big names ( e.g. whether_conditions_uk_2016 ), which is not ideal when we want to reference them. An alias can help us create a short name to use in our queries .
An exercise of mesa alias is shown below where we apply an alias to the CLIENTS board to call it “ example_name ” :
SELECT * FROM whether_conditions_uk_2016 AS example_name
BigQuery SQL Important Functions
immediately that we have seen how to construct a question in BigQuery SQL and what the syntax is, it ’ s significant to see core functions that you can use to power up your analytics capabilities. We ’ ll discourse respective types of functions, and we ’ ll show the most democratic and utilitarian ones that will help you extract core insights from your data .
BigQuery SQL Functions : Aggregate functions
Aggregate functions are possibly the most practice type of SQL officiate. An aggregate affair summarizes the rows of a group into a single value. Let ’ s see the most common ones along with some examples :
AVG
The
AVG
affair takes any numeric input type and returns the median or
NaN
if the input contains not a number value. Let ’ s see a promptly exercise using the downstairs dataset :
name | age |
---|---|
John Doe | 37 |
John Davis | 42 |
Jessie Cole | 19 |
Jack Delaney | 74 |
If we want to find the average old age of our group, we can use this question :
SELECT AVG(age) FROM CLIENTS AS avg_age
The leave set will look like this :
avg_age |
---|
23.25 |
COUNT
The
COUNT
function takes any row input and returns the total total of rows in the remark. Let ’ s see a quick model using the below dataset :
name | age |
---|---|
John Doe | 37 |
John Davis | 42 |
Jessie Cole | 19 |
Jack Delaney | 74 |
If we want to find the full count of people in our group, we can use this question :
SELECT COUNT(name) FROM CLIENTS AS number_of_people
The result set will look like this :
number_of_people |
---|
4 |
MAX
The
MAX
function takes any row input and returns the maximum prize of non-NULL rows in the input. Let ’ s see a immediate exercise using the below dataset :
name | age |
---|---|
John Doe | 37 |
John Davis | 42 |
Jessie Cole | 19 |
Jack Delaney | 74 |
If we want to find the oldest age in our group, we can use this question :
SELECT MAX(age) FROM CLIENTS AS max_age
The result set will look like this :
max_age |
---|
74 |
MIN
Opposed to the MAX serve, the
MIN
function takes any row remark and returns the minimum value of non-NULL rows in the stimulation. Using the below dataset :
name | age |
---|---|
John Doe | 37 |
John Davis | 42 |
Jessie Cole | 19 |
Jack Delaney | 74 |
If we want to find the youngest age in our group, we can use this question :
SELECT MIN(age) FROM CLIENTS AS min_age
The result set will look like this :
min_age |
---|
19 |
SUM
stopping point but not least, the most popular aggregate officiate is
SUM
. This function takes any numeral non-null row input signal and returns the summarize of the values. We can use the below dataset :
name | purchases |
---|---|
John Doe | 12 |
John Davis | 1 |
Jessie Cole | 22 |
Jack Delaney | 44 |
If we want to find the total number of purchases, we can use this question :
SELECT SUM(purchases) FROM CLIENTS AS total_purchases
The result set will look like this :
total_purchases |
---|
79 |
BigQuery SQL Functions : mathematical functions
mathematical functions come in handy when you want to promptly calculate results between two or more numbers. basically, all mathematical functions have the take after behaviors :
- They return NULL if any of the input parameters appears to be blank.
- They return NaN if any of the arguments are not a number.
Let ’ s take a closer look at some of the most popular ones :
RAND
RAND
generates a pseudo-random value of type FLOAT64 in the roll of ( 0, 1 ), inclusive of 0 and exclusive of 1. It doesn ’ t need any arguments, and you can use it like this :
RAND()
SQRT
SQRT
computes the square etymon of the input X. It generates an mistake if X is less than 0, and below is a quick example of such custom :
SQRT(X)
POW
POW
or
POWER
returns the rate of remark X raised to the power of remark Y .
POW(X, Y)
DIV
DIV
is the simple division between two integers. Returns the result of integer division of stimulation X by remark Y. division by zero returns an error and class by -1 may overflow .
DIV(X, Y)
SAFE_DIVIDE
If you need to avoid the error in your question ( e.g. if you divide by zero ), you can use
SAFE_DIVIDE
. This affair is equivalent to the
DIV
but returns
NULL
if an error occurs, such as a division by zero erroneousness .
SAFE_DIVIDE(X, Y)
ROUND
ROUND
is responsible to round the stimulation X to the nearest integer. If a moment remark N is salute,
ROUND
rounds X to N decimal fraction places after the decimal fraction point. If N is negative, then it will round off digits to the left of the decimal points.
ROUND(X, [, N])
CEIL
If you want to specify the round rules and constantly round down to the largest integral measure of the input X, then you should use
CEIL
or
CEILING
. This function rounds the stimulation X to the largest integral value that is not greater than ten .
CEIL(X)
FLOOR
Opposed to
CEIL
, if you want to round to the smallest integral measure of the input X that is not less than ten, you should use the
FLOOR
function .
FLOOR(X)
BigQuery SQL Functions : string functions
String functions are actually useful for manipulating text fields. These string functions work on two different values :
STRING
and
BYTES
data types.
STRING
values must be grammatical UTF-8. Let ’ s see some of the most useful string functions .
CONCAT
CONCAT
concatenates one or more values into a individual solution. All values must be
BYTES
or data types that can be cast to
STRING
.
SELECT CONCAT("Hello", " ", "World") AS concatenated_string
concatenated_string |
---|
Hello World |
ENDS_WITH
ENDS_WITH
takes two
STRING
or
BYTES
values and returns
TRUE
if the moment value is a suffix of the first one, otherwise it returns
FALSE
. Let ’ s see an example applying this routine in the below dataset .
fruit |
---|
Banana |
Apple |
Orange |
Apricot |
SELECT ENDS_WITH(fruit, "e") AS example FROM FRUITS
Based on the above dataset and using the question, we end up with this solution put :
example |
---|
FALSE |
TRUE |
TRUE |
FALSE |
STARTS_WITH
similarly,
STARTS_WITH
takes two
STRING
or
BYTES
values and returns
TRUE
if the irregular rate is a prefix of the inaugural one, otherwise it returns
FALSE
. An identical example if we use the below dataset would be :
fruit |
---|
Banana |
Apple |
Orange |
Apricot |
SELECT STARTS_WITH(fruit, "A") AS example FROM FRUITS
Based on the above dataset and using the question, we end up with this solution set :
example |
---|
FALSE |
TRUE |
FALSE |
TRUE |
LOWER
Another simple yet utilitarian string affair is
LOWER
. It takes a
STRING
or
BYTES
input and transforms it to lowercase. For exercise :
SELECT LOWER("Hello") AS lowered_string
lowered_string |
---|
hello |
UPPER
similar to the
LOWER
function is the
UPPER
, but rather of transforming the stimulation to lowercase, it transforms it to uppercase .
SELECT UPPER("Hello") AS uppered_string
uppered_string |
---|
HELLO |
REGEXP_CONTAINS
REGEXP_CONTAINS
returns
TRUE
if the rate is a overtone match for the regular saying. If the regular expression argument is disable, the serve returns an error. Let ’ s see an model using the below dataset :
test@example.com |
another_test@example.com |
wow.test.com |
example.com/test@ |
We can use the REGEXP_CONTAINS to verify whether the electronic mail is valid or not, as shown below :
SELECT email, REGEXP_CONTAINS(email, "@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+") AS is_valid FROM EMAILS
is_valid |
---|
TRUE |
TRUE |
FALSE |
FALSE |
REGEXP_EXTRACT
A great use of regular expression string officiate is if we want to extract a certain separate of a string.
REGEXP_EXTRACT
returns the substring in remark value that matches the regular expression and
NULL
if there is no match. Using the below dataset, we can extract the first separate of the electronic mail address using the watch question :
test@example.com |
another_test@example.com |
SELECT REGEXP_EXTRACT(email, r"^[a-zA-Z0-9_.+-]+") AS user_name FROM EMAILS
user_name |
---|
test |
another_test |
REGEXP_REPLACE
REGEXP_REPLACE
returns a string where all substrings of the remark prize that match the input regular construction are replaced with the replacement value. Let ’ s see an example where we replace every “ pie ” occurrence with “ jamming ” :
SELECT REGEXP_REPLACE("Apple pie or Cherry pie?", "pie", "jam") AS re
re |
---|
Apple jam or Cherry jam? |
REPLACE
similarly to the
REGEXP_REPLACE
, but without the function of RegEx, is the
REPLACE
function. It replaces all occurrences of the “ from ” input value with the “ to ” remark respect in the original input signal value. If the “ from ” input signal rate is empty, no replacement is made .
SELECT REPLACE("Apple pie", "pie", "jam") AS re
re |
---|
Apple jam |
SPLIT
SPLIT
splits the input value using the given delimiter. For
STRING
remark value, the default delimiter is the comma. Splitting an vacate
STRING
remark returns an ARRAY with a single empty
STRING
.
SELECT SPLIT("Apple pie or not?", " ") AS re
re |
---|
[Apple, pie, or, not?] |
SUBSTR
final but not least,
SUBSTR
r eturns a substring of the issue
STRING
or
BYTES
value. The position argument is an integer specifying the starting position of the substring. If the position is negative, the serve will start counting from the conclusion of the input signal value, with -1 indicating the last character. For case :
SELECT SUBSTR("apple", 3) AS substring
re |
---|
ple |
BigQuery SQL Functions : Array functions
Array functions allow you to manipulate an array and its elements. In BigQuery, an array is an order list consisting of zero or more values of the same datum type. You can create an range to group like data together rather than having them split into column, and then you can use them as needed. Below you will learn how to handle arrays using array functions .
ARRAY
The
ARRAY
function returns an array with one chemical element for each quarrel in a subquery. For example, the below question will generate a final examination range with all the elements of the subquery .
SELECT ARRAY (SELECT "a" UNION ALL SELECT "b" UNION ALL SELECT "c") AS generated_array
generated_array |
---|
[“a”, “b”, “c”] |
ARRAY_CONCAT
ARRAY_CONCAT
concatenates one or more arrays with the lapp element type into a individual range .
SELECT ARRAY_CONCAT(["a", "b"], ["c", "d"], ["e", "f"]) AS starting_alphabet
starting_alphabet |
---|
[“a”, “b”, “c”, “d”, “e”, “f”] |
ARRAY_LENGTH
ARRAY_LENGTH
returns the size of the range and zero if the align is empty .
SELECT ARRAY_LENGTH(["a", "b"]) AS array_length
array_length |
---|
2 |
ARRAY_TO_STRING
ARRAY_TO_STRING
transforms arrays to strings by concatenating all of the elements. This function takes an array and a concatenation value as an stimulation, like this :
SELECT ARRAY_TO_STRING(["a", "b"], "--") AS array_to_string
array_to_string |
---|
a–b |
Date functions are big when you need to handle dates in your dataset. Let ’ s see the most common date functions that BigQuery offers :
CURRENT_DATE
returns the current date for the specified or the default timezone .
SELECT CURRENT_DATE() AS today;
today |
---|
2021-05-10 |
EXTRACT
When you ’ re using
EXTRACT
, providing a DATE field allows you to extract a specific region of the date. The provide character must be any of the below :
- DAYOFWEEK: value from 1 to 7. Sunday is the first day of the week.
- DAY: value from 1 to 31 depending on the month.
- DAYOFYEAR: value from 1 to 365 (or 366 for leap years).
- WEEK: value from 0 to 53. Sunday is the first day of the week. For the dates prior to the first Sunday of the year, the function will return 0.
- WEEK(
): value from 0 to 53. The specified WEEKDAY is the first day for the week. For the dates prior to the first WEEKDAY of the year, the function will return 0. - ISOWEEK: value from 1 to 53 using the ISO 8601 week boundaries. Monday is the first day of the ISOWEEK. The first ISOWEEK of each ISO year begins on the Monday before the first Thursday of the Gregorian calendar year.
- MONTH: value from 1 to 12. January is the first month of the year.
- QUARTER: value from 1 to 4.
- YEAR: year value in the format YYYY.
- ISOYEAR: year value using the ISO 8601 format boundaries.
Let ’ s see a quick example on how to use the EXTRACT officiate :
SELECT EXTRACT(MONTH FROM DATE '2021-05-01') AS the_day
the_day |
---|
05 |
The
DATE
affair is creditworthy for constructing a DATE field from integer values representing the year, month, and day .
SELECT DATE(2018, 11, 23) AS specific_date
specific_date |
---|
2018-11-23 |
DATE_ADD
adds a specific time time interval to an input signal DATE rate . The interval must be provided as an input and can be any of the under :
- DAY
- WEEK
- MONTH
- QUARTER
- YEAR
You can use the
DATE_ADD
affair like this :
SELECT DATE_ADD(DATE "2018-11-23", INTERVAL 2 DAY) AS two_days_later
two_days_later |
---|
2018-11-25 |
DATE_SUB
works similarly to the DATE_ADD, but rather of adding a specific time interval to the input DATE value, it subtracts the interval from it. For case :
SELECT DATE_SUB(DATE "2018-11-23", INTERVAL 2 DAY) AS two_days_earlier
two_days_earlier |
---|
2018-11-21 |
DATE_DIFF
calculates the number of unharmed specified intervals between two DATE objects .
SELECT DATE_DIFF(DATE '2018-11-25', DATE '2018-11-29', DAY) AS day_difference
day_difference |
---|
4 |
FORMAT_DATE
formats the input DATE sphere to the specified format string. The patronize format elements for a DATE sphere can be found hera .
SELECT FORMAT_DATE("%b %d, %Y", DATE "2018-11-22") AS new_format
new_format |
---|
Nov 22, 2018 |
PARSE_DATE
is possibly the most simple function between date functions, but besides the most use one.
PARSE_DATE
converts a string representation of date to a DATE object. In order for the
PARSE_DATE
to work, the two inputs ( format string and date string ) have to match .
For exemplar, the below question will work because the two inputs match :
SELECT PARSE_DATE("%b %e %Y", "Dec 25 2018")
On the early hand, the follow question will not work because the two inputs do not match :
SELECT PARSE_DATE("%b %e %Y", "Sunday May 09 2021")
Besides DATE fields, BigQuery SQL besides supports DATETIME functions that behave similarly to the DATE functions, but for DATETIME fields. A DATETIME object represents a date and clock time as it might be displayed on a calendar or clock, independent of clock zone. Let ’ s see the functions :
CURRENT_DATETIME
returns the stream date and time for the specified or the default option time partition .
SELECT CURRENT_DATETIME() AS today
today |
---|
2021-05-10T10:38:47.046465 |
The
DATETIME
function is responsible to construct a DATETIME field from integer values representing the year, calendar month and day, hour, moment and moment .
SELECT DATE(2018, 11, 23, 10, 21, 44) AS specific_date
specific_date |
---|
2018-11-23T10:21:44 |
EXTRACT
When you ’ re using
EXTRACT
and provide a DATETIME field, it allows you to extract a specific function of the date/time value. The leave partially must be any of the below :
- MICROSECOND
- MILLISECOND
- SECOND
- MINUTE
- HOUR
- DAYOFWEEK
- DAY
- DAYOFYEAR
- WEEK
- WEEK(
) - ISOWEEK
- MONTH
- QUARTER
- YEAR
- ISOYEAR
Let ’ s see a quick exercise on how to use the
EXTRACT
function :
SELECT EXTRACT(HOUR FROM DATETIME(2021, 05, 01, 10, 23, 11)) AS hour
hour |
---|
10 |
DATETIME_ADD
adds a specific time interval to an stimulation DATETIME value . The time interval must be provided as an stimulation and can be any of the below :
- MICROSECOND
- MILLISECOND
- SECOND
- MINUTE
- HOUR
- DAY
- WEEK
- MONTH
- QUARTER
- YEAR
You can use the
DATETIME_ADD
function as shown below :
SELECT DATETIME_ADD(DATETIME "2018-11-23 15:30:02", INTERVAL 10 MINUTE) AS ten_minutes_later
ten_minutes_later |
---|
2018-11-23T15:40:02 |
DATETIME_SUB
works similarly to the
DATETIME_ADD
function, except rather of adding a specific time time interval to the input DATETIME value, it subtracts the interval from it. For example :
SELECT DATE_SUB(DATE "2018-11-23 15:30:02", INTERVAL 2 HOURS) AS two_hours_earlier
two_hours_earlier |
---|
2018-11-23T13:30:02 |
DATETIME_DIFF
calculates the total of wholly specified intervals between two DATEΤΙΜΕ objects .
SELECT DATEΤΙΜΕ_DIFF(DATEΤΙΜΕ '2018-11-25 14:24:44', DATETIME '2018-11-25 17:22:42', HOUR) AS hour_difference
hour_difference |
---|
3 |
FORMAT_DATETIME
formats the input signal DATETIME field to the specified format string .
SELECT FORMAT_DATETIME("%b %d, %Y", DATETIME "2018-11-22 15:30:22") AS new_format
new_format |
---|
Nov 22, 2018 |
BigQuery SQL Functions : time functions
BigQuery SQL besides supports time functions to help you wield TIME objects. Let ’ s see the most crucial ones :
CURRENT_TIME
CURRENT_TIME
returns the current fourth dimension for the specified or the default option time zone .
SELECT CURRENT_TIME() AS now
now |
---|
10:38:47.046465 |
TIME
The
TIME
affair constructs a TIME field value from integer values representing the hour, minute and moment .
SELECT DATE(10, 21, 44) ФЫ specific_time
specific_time |
---|
10:21:44 |
EXTRACT
When you ’ re using
EXTRACT
and provide a TIME field, it allows you to extract a specific part of the time value. The supply part must be any of the below :
- MICROSECOND
- MILLISECOND
- SECOND
- MINUTE
- HOUR
Let ’ s see a quick model on how to use the
EXTRACT
affair :
SELECT EXTRACT(HOUR FROM TIME "10:23:11") AS hour
hour |
---|
10 |
TIME_ADD
TIME_ADD
adds a specific meter interval to an input signal TIME rate . The interval must be provided as an input signal and can be any of the below :
- MICROSECOND
- MILLISECOND
- SECOND
- MINUTE
- HOUR
You can use the
TIME_ADD
serve, as shown below :
SELECT TIME_ADD(TIME "15:30:02", INTERVAL 10 MINUTE) AS ten_minutes_later
ten_minutes_later |
---|
15:40:02 |
TIME_SUB
TIME_SUB
works similarly to the
TIME_ADD
, but alternatively of adding a specific clock time time interval to the input TIME prize, it subtracts the interval from it. For exemplar :
SELECT TIME_SUB(TIME "15:30:02", INTERVAL 2 HOURS) AS two_hours_earlier
two_hours_earlier |
---|
13:30:02 |
TIME_DIFF
TIME_DIFF
calculates the count of whole specified intervals between two ΤΙΜΕ objects .
SELECT ΤΙΜΕ_DIFF(ΤΙΜΕ '14:24:44', TIME '17:22:42', HOUR) AS hour_difference
hour_difference |
---|
3 |
FORMAT_TIME
FORMAT_TIME
formats the stimulation TIME discipline to the specified format string .
SELECT FORMAT_TIME("%R", TIME "15:30:22") AS new_format
new_format |
---|
15:30 |
PARSE_TIME
PARSE_TIME
converts a string representation of date to a TIME aim. In order for the
PARSE_TIME
to work, the two inputs ( format string and clock time string ) have to match .
The case below will work because the two inputs catch :
SELECT PARSE_TIME("%I:%M:%S", "17:30:02")
On the other hand, the following question will not work because the two inputs do not match :
SELECT PARSE_TIME("%I:%M", "17:30:02")
Besides the TIME functions, BigQuery SQL besides supports TIMESTAMP functions. Bear in beware that these functions will return a runtime error if bubble over occurs and the solution values are bounded by the define date and timestamp min and soap values. Let ’ s jump veracious into the most popular functions and how you can use them :
CURRENT_TIMESTAMP
returns the current clock for the specified or the default timezone .
SELECT CURRENT_TIMESTAMP() AS now
now |
---|
2021-05-10 21:23:11.120174 UTC |
The
TIMESTAMP
affair is responsible to construct a TIMESTAMP field from a string value representing the timestamp .
SELECT TIMESTAMP("2018-02-13 12:34:56+00") AS specific_date
specific_date |
---|
2018-02-13 12:34:56 UTC |
TIMESTAMP_ADD
adds a particular time interval to an input TIMESTAMP_ADD prize . The time interval must be provided as an stimulation and can be any of the below :
- MICROSECOND
- MILLISECOND
- SECOND
- MINUTE
- HOUR
- DAY
You can use the
TIMESTAMP_ADD
serve like this :
SELECT TIMESTAMP_ADD(TIMESTAMP("2018-11-23 15:30:02+00"), INTERVAL 10 MINUTE) AS ten_minutes_later
ten_minutes_later |
---|
2018-11-23 15:40:02 UTC |
TIMESTAMP_SUB
works similarly to the
TIMESTAMP_ADD
, but rather of adding a particular time interval to the input TIMESTAMP value, it subtracts the interval from it. For exemplar :
SELECT TIMESTAMP_SUB(TIMESTAMP("2018-11-23 15:30:02+00"), INTERVAL 2 HOURS) AS two_hours_earlier
two_hours_earlier |
---|
2018-11-23 13:30:02 UTC |
TIMESTAMP_DIFF
calculates the number of whole specified intervals between two ΤΙΜΕSTAMP objects .
SELECT TIMESTAMP_DIFF(TIMESTAMP("2018-11-25 14:24:44+00", TIMESTAMP("2018-11-25 17:22:42+00"), HOUR) AS hour_difference
hour_difference |
---|
3 |
FORMAT_TIMESTAMP
formats the input TIMESTAMP field to the specified format string .
SELECT FORMAT_TIMESTAMP("%b %d, %Y", TIMESTAMP "2018-11-22 15:30:22") AS new_format
new_format |
---|
Nov 22, 2018 |
PARSE_TIMESTAMP
converts a string representation of go steady to a TIMESTAMP object. In order for the
PARSE_TIMESTAMP
to work, the two inputs ( format bowed stringed instrument and timestamp string ) have to match .
For model, the below question will work because the two inputs equal :
SELECT PARSE_TIMESTAMP("%b %e %I:%M:%S %Y", "Dec 25 07:44:32 2018")
On the other bridge player, the following question will not work because the two inputs do not match :
SELECT PARSE_TIMESTAMP("%a %b %e %I:%M:%S %Y", "Dec 25 07:44:32 2018")
TIMESTAMP_SECONDS
transforms the seconds since 1970-01-01 00:00:00 UTC and transforms them to a TIMESTAMP object .
SELECT TIMESTAMP_SECONDS(1589031996) AS timestamp_value
timestamp_value |
---|
2020-05-09 13:46:36 UTC |
TIMESTAMP_MILLIS
transforms the milliseconds since 1970-01-01 00:00:00 UTC and transforms them to a TIMESTAMP object .
SELECT TIMESTAMP_SECONDS(1589031996000) AS timestamp_value
timestamp_value |
---|
2020-05-09 13:46:36 UTC |
TIMESTAMP_MICROS
transforms the microseconds since 1970-01-01 00:00:00 UTC and transforms them to a TIMESTAMP object .
SELECT TIMESTAMP_SECONDS(1589031996000000) AS timestamp_value
timestamp_value |
---|
2020-05-09 13:46:36 UTC |
Read our guide on Google BigQuery Datetime and Timestamp functions to learn more .
BigQuery SQL Functions : Operators
Operators exist to manipulate any number of data inputs and return a leave. They are normally represented by a particular character or keyword and do not use officiate call option syntax. Below we ’ ll see the three most park hustler types :
Arithmetic Operators
The arithmetical operators exist to manipulate two ( or more ) arithmetical inputs. All arithmetical operators accept input of numeral type ten, and the resultant role type has type X .
- Addition: X + Y
- Subtraction: X – Y
- Multiplication: X * Y
- Division: X / Y
Logical Operators
logical operators exist to use three-valued logic and produce a resultant role of type BOOL or NULL. BigQuery supports the AND, OR, and NOT legitimate operators .
Comparison Operators
comparison operators are the most frequently used type of operators. Comparisons always return a solution of type BOOL. Below is the list of the operators, along with their results :
- X < Y: Returns TRUE if X is less than Y.
- X <= Y: Returns TRUE if X is less than or equal to Y.
- X > Y: Returns TRUE if X is greater than Y.
- X >= Y: Returns TRUE if X is greater than or equal to Y.
- X = Y: Returns TRUE if X is equal to Y.
- X != Y: Returns TRUE if X is not equal to Y.
- X BETWEEN Y AND Z: Returns TRUE if X is between the range specified (Y and Z)
- X LIKE Y: Checks if the STRING X matches a pattern specified by the second operand Y.
- X IN Y: Returns FALSE if the Y is empty. Returns NULL if X is NULL. Returns TRUE if X exists within the Y struct.
BigQuery SQL Functions : conditional Expressions
conditional expressions can help you take actions if a sealed condition is met or not. This can be highly useful when analyzing data in BigQuery. Let ’ s see the clear conditional expressions in action :
CASE – WHEN
The
CASE
expression compares one construction with another matching expression, and the result is returned for the first consecutive
WHEN
clause. The remaining
WHEN
clauses are not evaluated. If there is no matching
WHEN
clause, the
ELSE
consequence is returned. If there is no
ELSE
result, then
NULL
is returned. Let ’ s see that in a agile exemplar with the downstairs dataset :
name | age |
---|---|
John Doe | 37 |
John Davis | 42 |
Jessie Cole | 19 |
Jack Delaney | 74 |
SELECT name, CASE WHEN age > 40 THEN “Senior” ELSE “Junior” END AS level FROM CLIENTS
name | level |
---|---|
John Doe | Junior |
John Davis | Senior |
Jessie Cole | Junior |
Jack Delaney | Senior |
COALESCE
You may never have hear of
COALESCE
, but it ’ s a fairly simple so far potent conditional construction.
COALESCE
returns the measure of the first non-null saying, and the remaining expressions are not evaluated. For case, if there is no NULL measure, then the first gear prize is returned :
SELECT COALESCE('A', 'B', 'C') AS result
result |
---|
A |
If there is a NULL rate then the beginning non-null value is returned :
SELECT COALESCE(NULL, NULL, 'C') AS result
name |
---|
C |
IF
IF
may be the most popular conditional expression, and it ’ s reasonably square. If the expression is evaluated as
TRUE
, then the true-result is returned. differently, the else-result is returned for all early cases .
SELECT IF(52 < 108, 'A', 'C') AS result
name |
---|
A |
IFNULL
IFNULL
evaluates the formulation within and, if it ’ south
NULL
, then it returns the given result. differently, the conditional expression returns the leave of the admit expression. Let ’ s see an example of a
NULL
evaluation :
SELECT IFNULL(NULL, 'C') AS result
name |
---|
C |
If the evaluation is not
NULL
, then this evaluation result is returned :
Read more: Online Marketing Degree
SELECT IFNULL('A', 'C') as result
name |
---|
A |
Where can I use BigQuery SQL ?
If you followed this article up to this point, you now have a brief agreement on how to construct your own queries and analyze your data to ability up your analytics capabilities and make smart decisions .
Advanced SQL is the greatest weapon an analyst can master to power up the decisions of the administration. Using SQL and BigQuery, you can perform sophisticated analysis, such as exploiter cleavage, or identify the best do audiences of your site. marketing and ecommerce KPIs can be analyzed to optimize your efforts, and you can besides find what products are the best sell for each season, so you can act proactively, maximizing the electric potential tax income for your occupation. BigQuery ML, a set of SQL extensions to support machine memorize, deserves a finical attention, so we ’ ve blogged about it in a freestanding article .
If you don ’ t have any data in BigQuery so far, you can use Coupler.io to import your data with merely a pair of clicks and then start building your analysis .
Back to Blog