Automatic Detection of Column Types in #PowerQuery

Working with column types can be challenging in Power Query. The default conversion of types may not always work well for you, and may create future refresh errors, when the column names change. I have blogged about it in details here, and usually recommend to use explicit type conversion, and remove the automatic Changed Type detection step.

Nevertheless, there are advanced scenarios (especially for me when I explore new datasets with dozens of numeric columns), where the explicit manual conversion of types is too time consuming, but going back to the automatic detection is not feasible (For example: When the first rows are not sufficient to detect the column types, and fill with null values, Power Query will end up with an Any type, which is not very helpful).

Wouldn’t it be useful to have your own automatic detection of columns? A technique that you can control, and will not fail to refresh, when the column names changes?

Here is the solution:

You can copy paste the following function into a blank query:

You can run it on any table, and it will convert columns to DateTime, Date, Whole Number and Decimal Number types, if more than 95% of the values are matching the relevant type. You can also change the threshold from 95% in the code.

To run this function, follow these steps:

Create a blank query in Power Query Editor, and paste this function into the Advanced Editor. Name the function as FnDetectAndTransformTypes. Select the query with the table you want to convert, and select the fx button (right to the formula bar in Power Query Editor. Oh, and if you don’t see the formula bar, enable it in View tab).

Now, in the formula bar, let’s say that you see the formula bar:

= Source

All you need to do, is change it to:

= FnDetectAndTransformTypes(Source)

You are done. You can now see that the type detected automatically.

Automatic Detection of Column Types in Power Query using M

Note: This technique will be slow on large datasets. To run it on a preview of the data – let’s say 1000 rows – you can change the following line:

ListColumn = Table.Column(SourceTable,ColumnName),

to

ListColumn = List.FirstN(Table.Column(SourceTable,ColumnName), 1000),

In the next blog post, I will provide more details on the M expression above. Looking forward for your feedback.

6 comments

  1. GMF Reply

    A good way to demonstrate the abilities of PowerQuery. I have data sets with a fair number of null values however so the 95% target is hard to match even if the non-nulls are a consistent data type. Is there a way to eliminate nulls from the denominator? It’s probably a variation of the List.Count command but I’m not sure which side of the parentheses it should be on…

  2. GMF Reply

    Here’s something that appears to work. But I would like to be able to distinguish between Any and Text types…

    (SourceTable, NumRows)=>
    let

    FnDetectType =
    (SourceTable, ColumnName)=>
    let
    ListColumn = if NumRows = 0 then
    Table.Column(SourceTable,ColumnName)
    else List.FirstN(Table.Column(SourceTable, ColumnName), NumRows),
    ListNulls = List.Count(List.Select(ListColumn , each _ is null)),
    ListNonNullCols = List.Count(ListColumn) – ListNulls,
    NumberPercentage =
    List.Count(
    List.Select(ListColumn , each _ is number)
    ) / ListNonNullCols,
    IntPercentage =
    List.Count(
    List.Select(ListColumn , each _ = Number.Round(_, 0))
    ) / ListNonNullCols,
    DatePercentage =
    List.Count(
    List.Select(ListColumn , each _ is date)
    ) / ListNonNullCols,
    DateTimePercentage =
    List.Count(
    List.Select(ListColumn , each _ is datetime)
    ) / ListNonNullCols,
    Max = List.Max({NumberPercentage, DatePercentage, DateTimePercentage}),
    Result =
    if Max < .95 then
    {ColumnName, type any}
    else if DateTimePercentage = Max then
    {ColumnName, type datetime}
    else if DatePercentage = Max then
    {ColumnName, type date}
    else if IntPercentage = Max then
    {ColumnName, Int64.Type}
    else
    {ColumnName, type number}
    in
    Result,

    ConvertTypesList =
    List.Transform(
    Table.ColumnNames(SourceTable),
    each FnDetectType(SourceTable, _)
    ),
    ConvertedTable = Table.TransformColumnTypes(SourceTable, ConvertTypesList)
    in
    ConvertedTable

    • Gil Raviv Post authorReply

      Thank you for sharing. What is your logic to differentiate between Any and Text?

  3. GMF Reply

    I have no idea how to distinguish between them. I suppose like Sherlock Holmes, once you have eliminated the impossible (numbers and dates) whatever remains is text! Text may be a safer bet and require fewer reversals than Any… Just about any data type will load as Text. I don’t know if you could test for Boolean, Duration, or Binary either since Boolean would be more likely to be a textual “TRUE” “FALSE” choice. Duration or Boolean would more likely be the outcome of a Power Query operation too rather than an input.

Leave a Reply