Feb 15, 2024

BQuant [2] - Correlation Matrices

Intro

A fairly common exploratory task is to generate a pairwise correlation matrix of some series. In this example, we’ll compute the correlation of quarterly revenue for members of SPX, along with correlation of quarterly change rates of revenue.

This example assumes you’re using BQuant (Desktop or Enterprise).

We’ll accomplish this using two different tools. First, with Pandas, one of the fundamental data tools in Python, and the second with DuckDB, which seamlessly provides an in-memory analytical SQL database.

I prefer the expressiveness and declarative structure of SQL over Pythonic dataframe libraries, but that’s personal preference, so I’ll show both.

BQL Query: Members of SPX

In BQL, we’ll use members(‘SPX Index’) to list the constituents of SPX for the past 11 quarters:

    get(sales_rev_turn)
    for(members('SPX Index'))
    with(fpo=range(-10Q, 0Q), fpt=Q, fill=prev)
    preferences(addcols=all)

members('SPX Index') returns the local exchange tickers for members of the index. If you need the fundamental tickers, you’d need: translatesymbols(members('SPX Index'), targetidtype='FUNDAMENTALTICKER')
Instead of members('SPX Index'), you could also pass a list of securities for(['IBM US Equity'], ['MSFT US Equity']
addcols=all: Returns the period and period_offset.
Aligning fiscal periods: You’ll have to decide what alignment you want. In this case, I’ll use fiscal period offsets, but this will align starting with the most recent reported fiscal quarter. Some judgment is needed to decide what you want.

Let’s see the code

View this Notebook directly at: https://github.com/paultiq/bqnt_examples/blob/main/bqnt_corr.ipynb

All the data points here are fake / randomized.

bqnt_corr.ipynb

bqnt_corr

Setup¶

Import any required libraries.

In [1]:

import bql
import plotly.express as px

In [2]:

import plotly.io as pio

pio.renderers.default = "plotly_mimetype+notebook"
pio.templates["iqmo"] = pio.templates["plotly"]
pio.templates["iqmo"].layout.margin = dict(l=50, r=50, t=50, b=50)
pio.templates["iqmo"].layout.height = 250
pio.templates.default = "iqmo"

In [3]:

bql_svc = bql.Service()

quarters = 10
index = "SPX Index"

corr_query = f"""get(sales_rev_turn)
    for(members('{index}'))
    with(fpo=range(-{quarters-1}Q, 0Q), fpt=Q, fill=prev)
    preferences(addcols=all)"""
response = bql_svc.execute(corr_query)
display(response)

spx_rev_df = bql.combined_df(response)

	PERIOD	PERIOD_OFFSET	sales_rev_turn
ID
CO1 FAKE	2022 Q3	0Q	58.600295
CO1 FAKE	2022 Q2	-1Q	180.148199
CO1 FAKE	2022 Q1	-2Q	559.713096
CO1 FAKE	2021 Q4	-3Q	460.075362
CO1 FAKE	2021 Q3	-4Q	601.315225
...	...	...	...
CO500 FAKE	2021 Q2	-5Q	419.912698
CO500 FAKE	2021 Q1	-6Q	390.696043
CO500 FAKE	2020 Q4	-7Q	348.535213
CO500 FAKE	2020 Q3	-8Q	179.162726
CO500 FAKE	2020 Q2	-9Q	26.443303

5000 rows × 3 columns

Pivot & Correlation Matrix¶

Pandas has a useful corr() function that computes the pairwise correlation between columns in a dataframe.

In [4]:

# Pivot the data to wide: one column per security
df_pivoted = spx_rev_df.pivot_table(
    index="PERIOD_OFFSET", columns="ID", values="sales_rev_turn"
)

# Compute the correlation matrix
correlation_matrix = df_pivoted.corr()

# Show only 5 rows / 5 columns, for blog
display(correlation_matrix.head(5).iloc[:, :5])

ID	CO1 FAKE	CO10 FAKE	CO100 FAKE	CO101 FAKE	CO102 FAKE
ID
CO1 FAKE	1.000000	-0.106820	-0.296994	-0.589395	-0.239758
CO10 FAKE	-0.106820	1.000000	-0.217948	0.079673	0.390123
CO100 FAKE	-0.296994	-0.217948	1.000000	0.475114	-0.521637
CO101 FAKE	-0.589395	0.079673	0.475114	1.000000	0.022925
CO102 FAKE	-0.239758	0.390123	-0.521637	0.022925	1.000000

In [5]:

# Compute the period/period percent change of sales_rev_turn
spx_rev_df_sorted = spx_rev_df.sort_values(by=["ID", "PERIOD"])

# Compute percent change
spx_rev_df_sorted["rev_pct_chg"] = spx_rev_df_sorted.groupby(level=0)[
    "sales_rev_turn"
].pct_change()

df_pivoted = spx_rev_df_sorted.pivot_table(
    index="PERIOD_OFFSET", columns="ID", values="rev_pct_chg"
)

correlation_matrix = df_pivoted.corr()
display(correlation_matrix.head(5).iloc[:, :5])

ID	CO1 FAKE	CO10 FAKE	CO100 FAKE	CO101 FAKE	CO102 FAKE
ID
CO1 FAKE	1.000000	-0.331428	-0.245076	-0.363978	-0.084547
CO10 FAKE	-0.331428	1.000000	-0.144900	0.407989	0.328154
CO100 FAKE	-0.245076	-0.144900	1.000000	-0.237802	-0.252401
CO101 FAKE	-0.363978	0.407989	-0.237802	1.000000	0.446727
CO102 FAKE	-0.084547	0.328154	-0.252401	0.446727	1.000000

In [6]:

small_matrix = correlation_matrix.head(5).iloc[:, :5]

px.imshow(small_matrix, text_auto=True)

DuckDB¶

The following is an example using DuckDB. DuckDB is a powerful in-memory analytics engine with a lot of features: it's great for problems that fit within a single server.

I prefer the declarative syntax of SQL and BQL query strings to the various Pythonic APIs and DataFrame libraries. SQL queries can express complex, multi-"step" transformations in a single operation.

This is just to demonstrate feasibility. The Pandas example is certainly the more concise solution in this case.

In [7]:

# This conditional code is only for blogging purposes:
# To avoid reinstalling dependencies multiple times.
# Don't do this in real code. Load your dependencies in a Custom Env.

try:
    import duckdb
except Exception as e:
    print(f"Error importing, installing and trying again str({e})")
    %pip install duckdb
    import duckdb

In [8]:

# Drop the pandas index, to make it easier to work with in duckdb.
spx_rev_df_c = spx_rev_df.reset_index()

df_corr = duckdb.sql("""
with q1 as 
    -- First, compute cross_correlation
    (SELECT a.id AS id1, 
            b.id AS id2, 
            CORR(a.sales_rev_turn, b.sales_rev_turn) AS correlation
        FROM spx_rev_df_c a
        JOIN spx_rev_df_c b ON a.period_offset = b.period_offset
        GROUP BY a.id, b.id
        ORDER BY a.id, b.id
    )
pivot q1 on id2 using last(correlation) group by id1 order by id1
""").df()

display(df_corr.set_index("id1").head(5).iloc[:, :5])

	CO1 FAKE	CO10 FAKE	CO100 FAKE	CO101 FAKE	CO102 FAKE
id1
CO1 FAKE	1.000000	-0.106820	-0.296994	-0.589395	-0.239758
CO10 FAKE	-0.106820	1.000000	-0.217948	0.079673	0.390123
CO100 FAKE	-0.296994	-0.217948	1.000000	0.475114	-0.521637
CO101 FAKE	-0.589395	0.079673	0.475114	1.000000	0.022925
CO102 FAKE	-0.239758	0.390123	-0.521637	0.022925	1.000000