
BQuant [2] - Correlation Matrices
Intro
A fairly common exploratory task is to generate a pairwise correlation matrix of some series. In this example, we’ll compute the correlation of quarterly revenue for members of SPX, along with correlation of quarterly change rates of revenue.
This example assumes you’re using BQuant (Desktop or Enterprise).
We’ll accomplish this using two different tools. First, with Pandas, one of the fundamental data tools in Python, and the second with DuckDB, which seamlessly provides an in-memory analytical SQL database.
I prefer the expressiveness and declarative structure of SQL over Pythonic dataframe libraries, but that’s personal preference, so I’ll show both.
BQL Query: Members of SPX
In BQL, we’ll use members(‘SPX Index’) to list the constituents of SPX for the past 11 quarters:
get(sales_rev_turn)
for(members('SPX Index'))
with(fpo=range(-10Q, 0Q), fpt=Q, fill=prev)
preferences(addcols=all)
members('SPX Index')
returns the local exchange tickers for members of the index. If you need the fundamental tickers, you’d need:translatesymbols(members('SPX Index'), targetidtype='FUNDAMENTALTICKER')
- Instead of
members('SPX Index')
, you could also pass a list of securitiesfor(['IBM US Equity'], ['MSFT US Equity']
addcols=all
: Returns the period and period_offset.- Aligning fiscal periods: You’ll have to decide what alignment you want. In this case, I’ll use fiscal period offsets, but this will align starting with the most recent reported fiscal quarter. Some judgment is needed to decide what you want.
Let’s see the code
View this Notebook directly at: https://github.com/paultiq/bqnt_examples/blob/main/bqnt_corr.ipynb
All the data points here are fake / randomized.
bqnt_corr.ipynb
Setup¶
Import any required libraries.
import bql
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
pio.templates["iqmo"] = pio.templates["plotly"]
pio.templates["iqmo"].layout.margin = dict(l=50, r=50, t=50, b=50)
pio.templates["iqmo"].layout.height = 250
pio.templates.default = "iqmo"
bql_svc = bql.Service()
quarters = 10
index = "SPX Index"
corr_query = f"""get(sales_rev_turn)
for(members('{index}'))
with(fpo=range(-{quarters-1}Q, 0Q), fpt=Q, fill=prev)
preferences(addcols=all)"""
response = bql_svc.execute(corr_query)
display(response)
spx_rev_df = bql.combined_df(response)
PERIOD | PERIOD_OFFSET | sales_rev_turn | |
---|---|---|---|
ID | |||
CO1 FAKE | 2022 Q3 | 0Q | 58.600295 |
CO1 FAKE | 2022 Q2 | -1Q | 180.148199 |
CO1 FAKE | 2022 Q1 | -2Q | 559.713096 |
CO1 FAKE | 2021 Q4 | -3Q | 460.075362 |
CO1 FAKE | 2021 Q3 | -4Q | 601.315225 |
... | ... | ... | ... |
CO500 FAKE | 2021 Q2 | -5Q | 419.912698 |
CO500 FAKE | 2021 Q1 | -6Q | 390.696043 |
CO500 FAKE | 2020 Q4 | -7Q | 348.535213 |
CO500 FAKE | 2020 Q3 | -8Q | 179.162726 |
CO500 FAKE | 2020 Q2 | -9Q | 26.443303 |
5000 rows × 3 columns
# Pivot the data to wide: one column per security
df_pivoted = spx_rev_df.pivot_table(
index="PERIOD_OFFSET", columns="ID", values="sales_rev_turn"
)
# Compute the correlation matrix
correlation_matrix = df_pivoted.corr()
# Show only 5 rows / 5 columns, for blog
display(correlation_matrix.head(5).iloc[:, :5])
ID | CO1 FAKE | CO10 FAKE | CO100 FAKE | CO101 FAKE | CO102 FAKE |
---|---|---|---|---|---|
ID | |||||
CO1 FAKE | 1.000000 | -0.106820 | -0.296994 | -0.589395 | -0.239758 |
CO10 FAKE | -0.106820 | 1.000000 | -0.217948 | 0.079673 | 0.390123 |
CO100 FAKE | -0.296994 | -0.217948 | 1.000000 | 0.475114 | -0.521637 |
CO101 FAKE | -0.589395 | 0.079673 | 0.475114 | 1.000000 | 0.022925 |
CO102 FAKE | -0.239758 | 0.390123 | -0.521637 | 0.022925 | 1.000000 |
# Compute the period/period percent change of sales_rev_turn
spx_rev_df_sorted = spx_rev_df.sort_values(by=["ID", "PERIOD"])
# Compute percent change
spx_rev_df_sorted["rev_pct_chg"] = spx_rev_df_sorted.groupby(level=0)[
"sales_rev_turn"
].pct_change()
df_pivoted = spx_rev_df_sorted.pivot_table(
index="PERIOD_OFFSET", columns="ID", values="rev_pct_chg"
)
correlation_matrix = df_pivoted.corr()
display(correlation_matrix.head(5).iloc[:, :5])
ID | CO1 FAKE | CO10 FAKE | CO100 FAKE | CO101 FAKE | CO102 FAKE |
---|---|---|---|---|---|
ID | |||||
CO1 FAKE | 1.000000 | -0.331428 | -0.245076 | -0.363978 | -0.084547 |
CO10 FAKE | -0.331428 | 1.000000 | -0.144900 | 0.407989 | 0.328154 |
CO100 FAKE | -0.245076 | -0.144900 | 1.000000 | -0.237802 | -0.252401 |
CO101 FAKE | -0.363978 | 0.407989 | -0.237802 | 1.000000 | 0.446727 |
CO102 FAKE | -0.084547 | 0.328154 | -0.252401 | 0.446727 | 1.000000 |
small_matrix = correlation_matrix.head(5).iloc[:, :5]
px.imshow(small_matrix, text_auto=True)
DuckDB¶
The following is an example using DuckDB. DuckDB is a powerful in-memory analytics engine with a lot of features: it's great for problems that fit within a single server.
I prefer the declarative syntax of SQL and BQL query strings to the various Pythonic APIs and DataFrame libraries. SQL queries can express complex, multi-"step" transformations in a single operation.
This is just to demonstrate feasibility. The Pandas example is certainly the more concise solution in this case.
# This conditional code is only for blogging purposes:
# To avoid reinstalling dependencies multiple times.
# Don't do this in real code. Load your dependencies in a Custom Env.
try:
import duckdb
except Exception as e:
print(f"Error importing, installing and trying again str({e})")
%pip install duckdb
import duckdb
# Drop the pandas index, to make it easier to work with in duckdb.
spx_rev_df_c = spx_rev_df.reset_index()
df_corr = duckdb.sql("""
with q1 as
-- First, compute cross_correlation
(SELECT a.id AS id1,
b.id AS id2,
CORR(a.sales_rev_turn, b.sales_rev_turn) AS correlation
FROM spx_rev_df_c a
JOIN spx_rev_df_c b ON a.period_offset = b.period_offset
GROUP BY a.id, b.id
ORDER BY a.id, b.id
)
pivot q1 on id2 using last(correlation) group by id1 order by id1
""").df()
display(df_corr.set_index("id1").head(5).iloc[:, :5])
CO1 FAKE | CO10 FAKE | CO100 FAKE | CO101 FAKE | CO102 FAKE | |
---|---|---|---|---|---|
id1 | |||||
CO1 FAKE | 1.000000 | -0.106820 | -0.296994 | -0.589395 | -0.239758 |
CO10 FAKE | -0.106820 | 1.000000 | -0.217948 | 0.079673 | 0.390123 |
CO100 FAKE | -0.296994 | -0.217948 | 1.000000 | 0.475114 | -0.521637 |
CO101 FAKE | -0.589395 | 0.079673 | 0.475114 | 1.000000 | 0.022925 |
CO102 FAKE | -0.239758 | 0.390123 | -0.521637 | 0.022925 | 1.000000 |