This topic comes up a lot on LinkedIn. Or rather, it does not when it should. And then excuses are made, on the assumption that no alternative exists.

But robust alternatives in Excel do exist, and have existed for decades. For example, ADO (ActiveX Data Objects) as referred to below, and more modern forms of ‘data fetch’ in the Fetch functionality in JavaScript-derivatives such as TypeScript which is the language in Microsoft’s Office Script that we have in Excel on the Web.

In the social media discussions that revolve around the slowness of Power Query, the pivotal point that is not mentioned often enough is, are the things that slow Power Query down necessary for the particular application? The response I usually get is a blank expression, as it to say ‘what alternative is there’?

I use this illustration to make this point. If you have a regular weekly trip from A to B for a meeting, and every week you detour to go via C so that another attendee can be picked up. It has the added benefit of a 10 minute discussion in the car prior to the meeting to discuss tactics. Okay. But this week he is not going to attend as he on holiday. Do you still make that detour to C?

The problem with the social media centred Excel-awareness is that most people do not see Power Query as a detour. They see it as an incredibly powerful tool. It is. And it is accessible to those who are attracted to ‘comprehensive wizards that are visually prominent’. But it is at a cost. And most people do not see this.

What happens that makes Power Query slow?

The primary reason Power Query may be slower than ADO (ActiveX Data Objects) when importing data from a relational database into Excel is due to the additional transformation, data loading, and optimization steps that Power Query performs compared to ADO’s more direct data fetching approach.

Here’s a breakdown of what happens in between that contributes to the slower performance:

1. Data Shaping and Transformation Overhead

Power Query is designed not just to pull in raw data but to enable users to perform various data transformations (such as filtering, merging, pivoting, and more) as part of the import process. Even if you’re not explicitly transforming data, Power Query may still perform optimization checks and ensure the data is formatted correctly for further transformations. ADO, on the other hand, is a simpler data connection and retrieval mechanism that brings in the data more directly without performing these operations.

2. M Query Translation and Execution

Power Query uses its own language, M, to describe the transformations and queries. When you import data using Power Query, it translates the steps into an M query and often pushes these steps back to the database (in the form of SQL for relational databases) to perform transformations at the data source itself. However, this translation layer can introduce latency, especially if the translation is complex or if the database does not optimize the generated SQL query well.

ADO, conversely, executes SQL queries or stored procedures directly and retrieves the results without going through the same translation and optimization process.

3. Query Folding (or Lack of It)

Power Query tries to “fold” the M query into native SQL to push as much of the computation back to the database as possible. However, not all transformations in Power Query can be folded into SQL. If Power Query is unable to fold certain parts of the query, those transformations happen locally in Power Query, which is slower as it processes data within Excel, not on the database server.

ADO, by contrast, relies on you providing a SQL query that the database will execute directly. Since there is no intermediate step of trying to fold an M query, ADO doesn’t have this overhead.

4. Data Type Conversion and Schema Detection

Power Query tries to automatically detect and convert data types, and it handles more complex metadata (column names, types, relationships) compared to ADO. This additional processing can cause delays. ADO typically just imports the data in its raw format, leaving the data type conversions and schema handling to Excel later if necessary.

5. Asynchronous Processing and Caching

Power Query loads data asynchronously and sometimes caches parts of the data to improve performance on repeated queries. While caching can improve performance in some cases (when data is reused), it can also add overhead to the initial data load. ADO, on the other hand, does not have such mechanisms; it simply retrieves the data as quickly as the connection allows.

6. Complex User Interface Integration

Power Query integrates more deeply into Excel’s user interface, with options for refreshing queries, adjusting the data model, and more. While these features are beneficial for end users, they add some overhead that you wouldn’t see in simpler ADO-based imports, which are purely about data retrieval.

7. Connection and Query Overhead

Power Query establishes connections and handles security and query execution in a more user-friendly way, with built-in connectors and user prompts. ADO, being a more “barebones” method, establishes a connection quickly and directly runs the query without any additional layers.

Summary of Key Differences:

  • Power Query:
  • Transforms and shapes data.
  • Translates M queries and folds them into SQL (when possible).
  • Handles metadata, caching, and type conversion.
  • Integrated into Excel’s data model and UI.
  • ADO:
  • Directly executes SQL and retrieves raw data.
  • Minimal or no transformations at retrieval.
  • Simpler and more direct data import.

How to Optimize Power Query Performance:

To mitigate some of the performance issues, you can take steps such as:

  • Ensuring query folding is fully supported by simplifying your transformations.
  • Reducing the number of transformations in Power Query, especially complex ones.
  • Pulling smaller, more targeted datasets rather than large tables with unnecessary columns or rows.

Ultimately, Power Query trades some speed for flexibility and ease of use, which is why it may be slower than ADO for purely data retrieval purposes.

You’ve been listening to a podcast by Hiran de Silva. Read by Bill.

Related Articles

Hiran de Silva

View all posts

Add comment

Your email address will not be published. Required fields are marked *