How does Prism decide what the data type of each field is when I connect to a data source?
This is a question that keeps popping up, usually because it has direct implications on the type of analysis you can later perform on these fields. More specifically:
- Numeric Fields can act as regular dimensions as well as measures, with any type of aggregation (Sum, Avg, Min, Max, Count, DistinctCount)
- Textual Fields can act as regular dimensions as well as measures, limited to the Count and DistinctCount aggregations.
- Date/Time Fields can act as Time dimensions, Regular Dimensions and measures, limited to the Count and DistinctCount aggregations.
Identifying Database Table Fields
Most serious databases such as SQL Server, Oracle, MySQL etc provide detailed schema information about the tables they hold. When connecting to these types of sources, Prism simply asks the database what data type each field is configured to.
Identifying OLAP Cube Fields
OLAP cubes are handled a little differently than other data sources as they are already structured within a dimensional model. When connecting to an OLAP cube, Prism correlates a dimension for each cube dimension, and a measure for each cube measure.
Identifying Excel Range Fields
In the case of Excel, identifying field types is a little more tricky. This is
because the creator of the Excel file can mix up different types of values in
different types of formats within the same field.
In order to determine the type of field, Prism will search for the first
non-empty cell in the relevant fields (after skipping the title row) and
evaluate two things:
- The formatting applied to the cell. If the cell is formatted in a way
Prism recognizes (Date/Time formatting, currency, numeric value with two decimal
places, etc), the field type will be determined according to that format.
- If Prism does not recognize the applied formatting, it will try to figure
out the data type according to the value itself.
This methodology is not 100% accurate, but it usually is.
However, since Prism will get confused sometimes (for example, if the first
non-empty cell holds an irrelevant value that does not match in type with the
rest of the field values, or if you have more than one title row), Prism allows
you to define all types of dimensions/measures on any field in Excel. If during
the synchronization process (where Prism brings the Excel data into its own
belly) a value that does not match the dimension/measure configuration is
encountered - You can either fix this value, or skip the row it belongs to.
Elad
SiSense