Distinct Counts
This applies to: Visual Data Discovery
Distinct count functionality determines the number of unique values in a column or expression within a selected table by comparing all the records pulled from the data store by a data source configuration. When distinct counts are used, unique value results are returned when analyzing data. For example, distinct counts could return the number of:
Unique customers in a sales database
Unique UPC codes for a category of products
The number of trucks in a company's fleet
For example, given a single collection and string field with the following three values:
- Apple
- Orange
- Apple
The distinct count returns 2, since there are only two distinct values (“Apple” and “Orange”), while an ordinary count returns 3 to reflect the total number of records. SQL-based connectors might produce a query that looks like this:
select count(distinct myField) from myCollection
Support for this feature by connector is shown in the following table.
Key:Y - Supported; N - Not Supported; N/A - not applicable
Connector | Supported? | Notes |
---|---|---|
Amazon Redshift | Y | |
Amazon S3 | Y | |
Apache Drill | Y | |
Apache Phoenix | Y | |
Apache Phoenix Query Server (QS) | Y | |
Apache Solr | Y | |
BigQuery | Y | If you need to access a BigQuery partition, explicitly include an alias for the built in partition column in your select clause, such as |
Cloudera Impala | Y | Cloudera Impala connectors can receive only a single distinct count field in a query. |
Cloudera Search | Y | |
Couchbase | Y | |
Dremio | Y | |
Elasticsearch 7.0 | Y | |
Elasticsearch 8.0 | Y | |
File Upload | Y | |
HDFS | Y | |
Hive | Y | |
Jira | Y | |
MemSQL | Y | |
Microsoft SQL Server | Y | |
MongoDB | Y | |
MySQL | Y | |
Oracle | Y | |
PostgreSQL | Y | |
Python | Y | |
Real Time Sales | Y | |
Salesforce | Y | |
SAP Hana | Y | |
SAP IQ | Y | |
Spark SQL | Y | |
Snowflake | Y | |
Teradata | Y | |
TIBCO DV | Y | |
Trino | Y | |
File Upload (Upload API) | Y | |
Vertica | Y |