Working with column types can be challenging in Power Query. The default conversion of types may not always work well for you, and may create future refresh errors, when the column names change. I have blogged about it in details here, and usually recommend to use explicit type conversion, and remove the automatic Changed Type detection step.

Nevertheless, there are advanced scenarios (especially for me when I explore new datasets with dozens of numeric columns), where the explicit manual conversion of types is too time consuming, but going back to the automatic detection is not feasible (For example: When the first rows are not sufficient to detect the column types, and fill with null values, Power Query will end up with an Any type, which is not very helpful).

Wouldn’t it be useful to have your own automatic detection of columns? A technique that you can control, and will not fail to refresh, when the column names changes?

Here is the solution:

You can copy paste the following function into a blank query:

(SourceTable)=>
let
    
FnDetectType =
    (SourceTable, ColumnName)=>
    let
        ListColumn = Table.Column(SourceTable,ColumnName),
        NumberPercentage =
            List.Count(
                List.Select(ListColumn , each _ is number)
            ) / List.Count(ListColumn),
        IntPercentage =
            List.Count(
                List.Select(ListColumn , each _ is number and _ = Number.Round(_, 0))
            ) / List.Count(ListColumn),
        DatePercentage =
            List.Count(
                List.Select(ListColumn , each _ is date)
            ) / List.Count(ListColumn),
        DateTimePercentage =
            List.Count(
                List.Select(ListColumn , each _ is datetime)
            ) / List.Count(ListColumn),
        Max = List.Max(
            {NumberPercentage, DatePercentage, DateTimePercentage, IntPercentage}
        ),
        Result =
            if Max < .95 then
                {ColumnName, type any}
            else if DateTimePercentage = Max then
                {ColumnName, type datetime}
            else if DatePercentage = Max then
                {ColumnName, type date}
            else if IntPercentage = Max then
                {ColumnName, Int64.Type}
            else
                {ColumnName, type number}
    in
        Result,

    ConvertTypesList =
        List.Transform(
            Table.ColumnNames(SourceTable),
            each FnDetectType(SourceTable, _)
        ),
    ConvertedTable = Table.TransformColumnTypes(SourceTable, ConvertTypesList)
in
    ConvertedTable

You can run it on any table, and it will convert columns to DateTime, Date, Whole Number and Decimal Number types, if more than 95% of the values are matching the relevant type. You can also change the threshold from 95% in the code.

To run this function, follow these steps:

Create a blank query in Power Query Editor, and paste this function into the Advanced Editor. Name the function as FnDetectAndTransformTypes. Select the query with the table you want to convert, and select the fx button (right to the formula bar in Power Query Editor. Oh, and if you don’t see the formula bar, enable it in View tab).

Now, in the formula bar, let’s say that you see the formula bar:

= Source

All you need to do, is change it to:

= FnDetectAndTransformTypes(Source)

You are done. You can now see that the type detected automatically.

Automatic Detection of Column Types in Power Query using M

Note: This technique will be slow on large datasets. To run it on a preview of the data – let’s say 1000 rows – you can change the following line:

ListColumn = Table.Column(SourceTable,ColumnName),

to

ListColumn = List.FirstN(Table.Column(SourceTable,ColumnName), 1000),

Deep Dive – Understanding the M Expression

The FnDetectAndTransformTypes function receives a table as an argument, to detect the column types and convert them. In its inner implementation, there is another function – FnDetectType, which receives the table, and a single column to detects its type. FnDetectType is divided into two sections. In the first section, we calculate the percentages of decimal numbers, whole numbers, dates, and datetime in the column. For example, if your column consists of 10 rows, with two date values in it, the percentage of dates is 20%, and the DatePercentage identifier below will hold the 20%.

        ListColumn = Table.Column(SourceTable,ColumnName),
        NumberPercentage =
            List.Count(
                List.Select(ListColumn , each _ is number)
            ) / List.Count(ListColumn),
        IntPercentage =
            List.Count(
                List.Select(ListColumn , each _ is number and _ = Number.Round(_, 0))
            ) / List.Count(ListColumn),
        DatePercentage =
            List.Count(
                List.Select(ListColumn , each _ is date)
            ) / List.Count(ListColumn),
        DateTimePercentage =
            List.Count(
                List.Select(ListColumn , each _ is datetime)
            ) / List.Count(ListColumn),

In the preceding section, you can see that for each percentage type, we count the number of items in ListColumn (which is the list-representation as the column) in the denominator. In the numerator, we count the number of items in ListColumn, after we apply a filter on the list, and keep only items of a specific type. For example: We apply the condition is number to filter the list, and keep only decimal numbers in the expression:

List.Select(ListColumn , each _ is number)

To calculate the percentage of whole numbers, we first check if the value is a number, and then check if it is equal to the value of we round it down (Because when you round down a whole number, you get the same whole number as a result).

List.Select(ListColumn , each _ is number and _ = Number.Round(_, 0))

The reason why we first check if the value is a number, before we round it down, is to avoid errors, when Number.Round will be applied on values which are not numbers. When the first condition is not met, M will not evaluate the second condition.

In the next section of FnDetectType round, we find the maximal percentage that was detected. Then, if the maximal percentage is lower than 0.95 (You can set a different threshold if you wish, by replacing .95 with a different threshold), we’ll return a strange artifact: A list with two elements: The column name, and type any, for example: { Column1, type any }. Why do we need to return such a list? You will understand it at the end of this post.

If the maximal percentage is above or equal to 0.95, and the DateTimePercentage is the maximal percentage, we will return the list with the column name, and the datetime type. Otherwise, if the DatePercentage is the maximal percentage, we will return a list with the column name with the date type. Otherwise, if IntPercentage is the maximal percentage, we will return the list with the column name and Int64.Type (whole number). Otherwise, we will return the list with the column name and the number type (decimal number).

Do you know why, we first check if the maximal percentage is a whole number, before we check if the maximal percentage is decimal number?

        Max = List.Max(
            {NumberPercentage, DatePercentage, DateTimePercentage, IntPercentage}
        ),
        Result =
            if Max < .95 then
                {ColumnName, type any}
            else if DateTimePercentage = Max then
                {ColumnName, type datetime}
            else if DatePercentage = Max then
                {ColumnName, type date}
            else if IntPercentage = Max then
                {ColumnName, Int64.Type}
            else
                {ColumnName, type number}

Now, when FnDetectType returns the a list with the column name and the type of the column, we can run it on all the columns. This is done using List.Transform. which is invoked with the list of column names (Table.ColumnNames) as the first argument, and FnDetectType as the second argument. The result in ConvertTypesList is a list of nested lists, each with the column name and corresponding column type.

Finally, we apply column type conversion by running the function Table.TransformColumnTypes. This function is used, when you apply a manual column conversion. The second argument of Table.TransformColumnTypes, requires the exact format that we created in ConvertTypesList

(SourceTable)=>
let
...
    ConvertTypesList =
        List.Transform(
            Table.ColumnNames(SourceTable),
            each FnDetectType(SourceTable, _)
        ),
    ConvertedTable = Table.TransformColumnTypes(SourceTable, ConvertTypesList)
in
    ConvertedTable

The result is the converted column types. Hope you find it helpful.

Looking forward for your feedback in the comments section.