Excel and COM using Python

Excel is such a ubiquitous tool these days that it is hard to imagine how people would do various tasks without it. Unfortunately, doing things by hand can be error prone and slow compared to doing things programatically. As such, the subject of interacting with Excel from various programming languages comes up often on stack overflow as well as the associated site code review. Rightly so people take to automating spreadsheets to speed things up and to make them less error prone. The questions are varied but often the questions are related to slow performance. I was on Code Review and looking at a few questions when I noticed that there were two in a row that were basically the same and related to slow Interop. I wrote an answer to one but thought it would be a good a time as any to document the issue a little better and share a few tips that I have used in the past to speed things up substantially.

If you have written applications that interact with Excel using COM or just written VBA macros inside the provided Visual Basic editor you may have noticed that it can be slow due to overhead involved. The basics of avoiding this slowness are very similar weither you use internal VBA macros or code that uses the COM interface. Over the years I have used both although the VBA side much more often than COM. I have used several methods when writing simple VBA macros that have turned spreadsheet macros from horribly slow to sometimes very fast. A few of these things can be applied to COM access. So before we get to COM Interop, some general tips about dealing with Excel programatically are in order. For this we will use VBA inside of Excel to demonstrate a couple issues.

Simple Excel Overhead

There is an overhead to doing changes to a value on a sheet.

  1. Formulas tend to recalculate when things change.
  2. In order to show changes to the user, the screen refreshes.

To explore this, lets look at a simple example. Suppose you have a sheet in which you have values in column A and column B. You want to add them up and put the result in column C. Granted this is a very simple example that you could argue someone should not use code for but it is an easy way to demonstrate the concept. So.. lets consider a sheet with values in A, B, C and a formula in D. In our example there are about 30,000 rows..

A B C D
1 1 0 =C1/B1

The following code can be used to calculate a value for C

Sub SimpleRanges()

Dim ColA As Range
Dim ColB As Range
Dim ColC As Range
Dim Counter As Long
Dim start As Long

start = Timer

For Counter = 2 To ActiveSheet.UsedRange.Rows.Count

    Set ColA = ActiveSheet.Cells(Counter, 1)
    Set ColB = ActiveSheet.Cells(Counter, 2)
    Set ColC = ActiveSheet.Cells(Counter, 3)

    ColC = ColA + ColB
    ColA = ColC - ColB
    ColB = ColA

Next

MsgBox ("Runtime : " + Str(Timer - start))

End Sub

If your results are anything like mine when running code like this then you will get a result that gives a messagebox and says it takes around 70 seconds.

Avoiding Simple overhead

We can do some simple things to avoid some of the Excel overhead. The simple ways to avoid the first two items above is to turn the updating off near the beginning of your code. This can be done in the example above by adding two simple lines

Application.ScreenUpdating = False
Application.Calculation = xlManual

This will disable both of those in this particular workbook. You should remember to turn them back on after you are done doing your calculations

Application.ScreenUpdating = True
Application.Calculation = xlAutomatic

The code now looks like this

Sub SimpleRanges2()

Dim ColA As Range
Dim ColB As Range
Dim ColC As Range
Dim Counter As Long
Dim start As Long

start = Timer
Application.ScreenUpdating = False
Application.Calculation = xlManual

For Counter = 2 To ActiveSheet.UsedRange.Rows.Count

    Set ColA = ActiveSheet.Cells(Counter, 1)
    Set ColB = ActiveSheet.Cells(Counter, 2)
    Set ColC = ActiveSheet.Cells(Counter, 3)

    ColC = ColA + ColB
    ColA = ColC - ColB
    ColB = ColA

Next

Application.ScreenUpdating = True
Application.Calculation = xlAutomatic

MsgBox ("Runtime : " + Str(Timer - start))

End Sub

Code with these changes is significantly faster. If we run it we would find that in our example it will run in about 3.5 seconds. So just by adding a couple of lines we have code now that is about 20 times faster. This is a pretty serious performance change from such a simple addition to the code that it is a very worthwhile thing to do.

Faster Still

To make this really fast though, we need to limit the amount of time spent working our way through the Excel objects. Right now, we are still accessing each cell on each iteration. In order to address it we will need to minimize the interaction between the code we are running and Excel. To do this we are going to alter the code to only get the values and put them back when we are done.

Sub SimpleArray()

Dim ColA As Variant
Dim ColB As Variant
Dim ColC As Variant
Dim Counter As Long
Dim EndNum As Long
Dim start As Long

start = Timer
Application.ScreenUpdating = False
Application.Calculation = xlManual

EndNum = ActiveSheet.UsedRange.Rows.Count

ColA = ActiveSheet.Range(Cells(1, 1), Cells(EndNum, 1)).Value
ColB = ActiveSheet.Range(Cells(1, 2), Cells(EndNum, 2)).Value
ColC = ActiveSheet.Range(Cells(1, 3), Cells(EndNum, 3)).Value

For Counter = 1 To EndNum

    ColC(Counter, 1) = ColA(Counter, 1) + ColB(Counter, 1)
    ColA(Counter, 1) = ColC(Counter, 1) - ColB(Counter, 1)
    ColB(Counter, 1) = ColA(Counter, 1)

Next

ActiveSheet.Range(Cells(1, 1), Cells(EndNum, 1)) = ColA
ActiveSheet.Range(Cells(1, 2), Cells(EndNum, 2)) = ColB
ActiveSheet.Range(Cells(1, 3), Cells(EndNum, 3)) = ColC

Application.ScreenUpdating = True
Application.Calculation = xlAutomatic

MsgBox ("Runtime : " + Str(Timer - start))

End Sub

In our example the messagebox gives odd numbers sometimes. It runs in far less than a second. It runs so fast that our timing method sometimes does not record it properly.

So, this is all great and fine but what does this have to do with accessing Excel from COM?

Adding COM Overhead

The overhead associated with COM can be much higher than the method we were using. In our example we noted that the screen updating, calculation, and working with each cell individually adds a tremendous amount of time. Taking that knowledge and applying it when we access Excel will make a huge difference in the time required to run our code.

The examples that follow are done in Python 3 using the pywin32 module. They are all running using Python 3.4. We are going to avoid the pitfalls we discovered when using VBA and limit the amount of time we spend going back and forth between the COM/Excel layer. Our simple code might look like this

import time
import win32com.client as win32


def simple_ranges():
    """ Open Excel using COM """
    starttime = time.time()

    xl = win32.gencache.EnsureDispatch('Excel.Application')
    wb = xl.Workbooks.Open('c:\programs\python\Excel_COM.xlsm')
    sh = wb.ActiveSheet

    end_num = sh.UsedRange.Rows.Count

    for i in range(1, end_num + 1):
        sh.Cells(i, 3).Value = sh.Cells(i, 1).Value + sh.Cells(i, 2).Value
        sh.Cells(i, 1).Value = sh.Cells(i, 3).Value - sh.Cells(i, 2).Value
        sh.Cells(i, 2).Value = sh.Cells(i, 1).Value

    wb.Save()
    xl.Quit()

    print("Runtime " + "{:.2f}".format(round(time.time() - starttime, 1)))

It takes a very long time... about 450-500 seconds as it goes back and forth through the COM and Excel layers.

We could apply our findings about screen updating and calculation but the screen is not visible by default. If it does anything at all when coming from COM I wouldn't know since I have not done it enough to tell. So lets skip right to limiting the back and forth through layers and just throw the others in for good measure.

It gives us a final piece of code that looks like this

import time
import win32com.client as win32


def simple_array():
    """ Open Excel using COM """
    starttime = time.time()
    xlCalculationManual = -4135
    xlCalculationAutomatic = -4105

    xl = win32.gencache.EnsureDispatch('Excel.Application')
    wb = xl.Workbooks.Open('c:\programs\python\Excel_COM.xlsm')
    sh = wb.ActiveSheet

    xl.Calculation = xlCalculationManual
    xl.ScreenUpdating = False

    end_num = sh.UsedRange.Rows.Count

    col_a = sh.Range(xl.Cells(1, 1), xl.Cells(end_num, 1)).Value
    col_b = sh.Range(xl.Cells(1, 2), xl.Cells(end_num, 2)).Value
    col_c = sh.Range(xl.Cells(1, 3), xl.Cells(end_num, 3)).Value

    cells_a = [0] * end_num
    cells_b = [0] * end_num
    cells_c = [0] * end_num

    for i in range(end_num - 1):
        cells_c[i] = col_a[i][0] + col_b[i][0]
        cells_a[i] = cells_c[i] - col_b[i][0]
        cells_b[i] = cells_a[i]

    sh.Range(xl.Cells(1, 1), xl.Cells(end_num, 1)).Value = cells_a
    sh.Range(xl.Cells(1, 2), xl.Cells(end_num, 2)).Value = cells_b
    sh.Range(xl.Cells(1, 3), xl.Cells(end_num, 3)).Value = cells_c

    xl.ScreenUpdating = True
    xl.Calculation = xlCalculationAutomatic

    wb.Save()
    xl.Quit()

    print("Runtime " + "{:.2f}".format(round(time.time() - starttime, 1)))

This runs in about 8 seconds.

There are improvements that could be made to the code, I am certainly not saying that this exact code is the best way to go about interacting with Excel however I am showing that it is much faster than interacting with each cell as we do calculations or set values into specific ranges and cells. Changes that could be further made have been left out since this ties as closely as I could think between the different code pieces in the two languages. The main point I was hoping to show is that doing things in batches when transfering over the COM interface in order to reduce the interaction can make a huge difference and I think in that regard the code has served it's purpose.