-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fact to Fact JOINS where Querying more than 1 Fact table #9209
Comments
Hi @ElisabethPA 👋 Great question! Also, thanks for the data model example and the query, it really helped to reproduce. So, I think the issue here is the direction of joins. If you:
For the same query, you'll get the following generated SQL where all fact tables are left joined to the dim table: SELECT
q_0."dim_product__product_name",
"fact_sales__total_sales" "fact_sales__total_sales",
"fact_targets__total_targets" "fact_targets__total_targets"
FROM
(
SELECT
"keys"."dim_product__product_name",
sum("fact_sales_key__fact_sales".sales_value) "fact_sales__total_sales"
FROM
(
SELECT
DISTINCT "fact_sales_key__dim_product".product_name "dim_product__product_name",
"fact_sales_key__fact_sales".sale_id "fact_sales__sale_id"
FROM
(
SELECT
*
FROM
(
VALUES
(1001, 'Laptop', 'Electronics'),
(1002, 'Phone', 'Electronics'),
(1003, 'Desk', 'Furniture')
) AS t(product_id, product_name, category)
) AS "fact_sales_key__dim_product"
LEFT JOIN (
SELECT
*
FROM
(
VALUES
(1, 1001, 500, '2024-02-01'),
(2, 1002, 700, '2024-02-02'),
(3, 1001, 800, '2024-02-03'),
(4, 1003, 600, '2024-02-04')
) AS t(sale_id, product_id, sales_value, sale_date)
) AS "fact_sales_key__fact_sales" ON "fact_sales_key__dim_product".product_id = "fact_sales_key__fact_sales".product_id
LEFT JOIN (
SELECT
*
FROM
(
VALUES
(1, 1001, 1500, '2024-02'),
(2, 1002, 1200, '2024-02'),
(3, 1003, 1000, '2024-02')
) AS t(
target_id,
product_id,
target_value,
target_month
)
) AS "fact_sales_key__fact_targets" ON "fact_sales_key__dim_product".product_id = "fact_sales_key__fact_targets".product_id
) AS "keys"
LEFT JOIN (
SELECT
*
FROM
(
VALUES
(1, 1001, 500, '2024-02-01'),
(2, 1002, 700, '2024-02-02'),
(3, 1001, 800, '2024-02-03'),
(4, 1003, 600, '2024-02-04')
) AS t(sale_id, product_id, sales_value, sale_date)
) AS "fact_sales_key__fact_sales" ON "keys"."fact_sales__sale_id" = "fact_sales_key__fact_sales".sale_id
GROUP BY
1
) as q_0
INNER JOIN (
SELECT
"keys"."dim_product__product_name",
sum("fact_targets_key__fact_targets".target_value) "fact_targets__total_targets"
FROM
(
SELECT
DISTINCT "fact_targets_key__dim_product".product_name "dim_product__product_name",
"fact_targets_key__fact_targets".target_id "fact_targets__target_id"
FROM
(
SELECT
*
FROM
(
VALUES
(1001, 'Laptop', 'Electronics'),
(1002, 'Phone', 'Electronics'),
(1003, 'Desk', 'Furniture')
) AS t(product_id, product_name, category)
) AS "fact_targets_key__dim_product"
LEFT JOIN (
SELECT
*
FROM
(
VALUES
(1, 1001, 500, '2024-02-01'),
(2, 1002, 700, '2024-02-02'),
(3, 1001, 800, '2024-02-03'),
(4, 1003, 600, '2024-02-04')
) AS t(sale_id, product_id, sales_value, sale_date)
) AS "fact_targets_key__fact_sales" ON "fact_targets_key__dim_product".product_id = "fact_targets_key__fact_sales".product_id
LEFT JOIN (
SELECT
*
FROM
(
VALUES
(1, 1001, 1500, '2024-02'),
(2, 1002, 1200, '2024-02'),
(3, 1003, 1000, '2024-02')
) AS t(
target_id,
product_id,
target_value,
target_month
)
) AS "fact_targets_key__fact_targets" ON "fact_targets_key__dim_product".product_id = "fact_targets_key__fact_targets".product_id
) AS "keys"
LEFT JOIN (
SELECT
*
FROM
(
VALUES
(1, 1001, 1500, '2024-02'),
(2, 1002, 1200, '2024-02'),
(3, 1003, 1000, '2024-02')
) AS t(
target_id,
product_id,
target_value,
target_month
)
) AS "fact_targets_key__fact_targets" ON "keys"."fact_targets__target_id" = "fact_targets_key__fact_targets".target_id
GROUP BY
1
) as q_1 ON (
q_0."dim_product__product_name" = q_1."dim_product__product_name"
OR (
q_0."dim_product__product_name" IS NULL
AND q_1."dim_product__product_name" IS NULL
)
)
ORDER BY
2 DESC
LIMIT
5000 Does this help? Also, which database do you use with Cube? Is it MS SQL? |
Thank you for a quick response on your side. What we expect to be a correct SQL query is the following. As you can notice, we reduced the usage of Fact tables to be only one-time per sub query. According to execution plan we tested both with you example and our suggested example below, the performance will be much better with our suggestion. This is important to us since our Fact tables contain millions of rows. We tried the same model in Power BI and queries in Power BI are aligned with our query below. So, is there a possibility to get same results in Cubedev? SELECT Thanks! |
TWO Fact tables are joined in the same query indirectly, while there is no Fact-to-Fact JOIN defined in example .js file .
We are migrating from Microsoft multidimensional OLAP to CubeDev. Our backend data is in star schema format. We have many facts and dimensions and we are not able to merge fact tables together. They have different granularities. In OLAP cubes we are able to query two measures from different facts in the same MDX query without any problem, while in CubeDev we have a major stopper.
The structure of SQL query generated by CubeDev is slowing down the performance, because we have 1 Fact table join with common dimension + 2nd Fact table joined with common dimension. Both of these queries are combined using JOIN as well. In practice this is generating a cross join between tables, that potentially for us have millions of rows.
Is there any work-around to avoid having LEFT join used in a simple queries where more than one FACT table is involved?
Here is our example:
Tables used:
Query we tried from our side:
SQL Query that is generated involves LEFT JOIN on Fact tables, which slows down the performance when tables will have more than millions rows.
The problem for us is the inner query, which shows that two fact tables in example above are cross-joined. Is there any way to avoid it?
Example of .js file here:
The text was updated successfully, but these errors were encountered: