The Warehouse is a managed Redshift solution which has the ability to store data from multiple origin data sources. Each Redshift cluster can contain multiple databases, and the Warehouse stores all relevant tables pulled in from the Cache UI in a database titled "site_#####" where the numbers indicate your Site ID.
If you're logged in via psql, you can type
SELECT datname FROM pg_database; (SQL)
to see the list of databases. The relevant database in the Warehouse which is connected to Sisense for Cloud Data Teams can be found in the cluster info page, using the above command/query, or within Sisense for Cloud Data Teams' IDE, using the following SQL query:
Warehouse Naming Conventions
How does the Warehouse enable cross-database joins between tables coming from different databases? It does so by putting all tables within the "site_####" database, so the joins are no longer actually cross database - all tables reside within the same database of the Warehouse. First, the origin database is given an ID, and the schema name is concatenated to the database ID, which combines the schema and db alias into a single schema within the "site_####" db.
Let's walk through an example.
Suppose I had a Warehouse with a db "site_1" which pulls in data from a postgres db called "production_replica," (assume the alias "db_5" is given by the Warehouse) and a table "public.daily_active_users" within "production_replica".
Within the Sisense for Cloud Data Teams IDE, if we write the query:
select * from production_replica.public.daily_active_users
the Warehouse will translate this query on the backend to the actual Warehouse table names :
This can be seen any time under the "Query" tab by running a select * against a table.
Warehouse Names, Sortkeys, and Distkeys
The following is a useful Warehouse query which will yield the database name (within the Warehouse), the schema name (within the origin database), and the primary sortkey and diststyle of each table in the Warehouse (specifically, the site_#### database). Additionally, the table size (GB) and a running cumulative table size are generated. (SQL)
(select distinct oid
from pg_class c
where id in (select oid from tbl_ids)
group by id),
(select attrelid,sum(case when a.attencodingtype=0 then 0 else 1 end)
as encoded_cols,count(*)as cols
from pg_attribute a
where a.attrelid in (select oid from tbl_ids)
group by a.attrelid),
(select count(distinct node) node_count
coalesce(regexp_substr(ti.schema, 'db_[0-9]+'), ti.schema) as origin_database_aliased_id,
right(ti.schema, len(ti.schema) - len(regexp_substr(ti.schema, 'db_[0-9]+_'))) as schema_name,
round(1.0*ti.size / 1024,2) current_size_gb,
sum(round(1.0*ti.size / 1024,2)) over (order by size desc rows unbounded preceding) as cumulative_size_on_disk
from svv_table_info ti
left join stp on stp.id=ti.table_id
left join colenc on colenc.attrelid=ti.table_id
cross join cluster_info
where ti.schema not in('pg_internal')
order by ti.size desc