Excel and Python integration

In Software Development


In today's article we would like to take a closer look at integration of Python and MS Excel. This question may arise, for example, when creating some online reporting system that needs to output results in a common format or some other tasks.
alt
Editorial Commitee Qualified.One,
Management
alt
"

Also in this article we will show the backward integration, i.e. how to use a function written in python in Excel, which can also be useful for automating reports.

Working with MS Excel files in Python

To work with Excel files from Python, I know of 2 options:

  • Using libraries such as xlrd, xlwt, xlutils or openpyxl
  • Working with a com-object

Let's look at working with these methods in more detail. We will use an excel file from which we first read the data from the first cell and then write them to the second cell. These simple examples will be enough for the first acquaintance.

Using libraries

So, the first method is quite simple and well described. For example, there is an excellent article to describe how to work with xlrd, xlwt, xlutils. So in this article I will give a short snippet of code using them.

To begin, load the required libraries and open the xls file for reading and select The right sheet of data:

import xlrd, xlwt
#open file
rb = xlrd.open_workbook('../ArticleScripts/ExcelPython/xl.xls',formatting_info=True)

#Select active sheet
sheet = rb.sheet_by_index(0)

Now let's see how to read the values from the correct cells:


#Get the value of the first cell A1
val = sheet.row_values(0)[0]

#receive list of values from all records
vals = [sheet.row_values(rownum) for rownum in range(sheet.nrows)]

As you can see reading the data is not difficult. Now let's write them to another file. To do this, create a new excel file with a new workbook:


wb = xlwt.Workbook()

ws = wb.add_sheet('Test')

Write the previously received data into a new file and save the changes:


# in A1 we write the value from A1 of the previous file
ws.write(0, 0, val[0])

# into column B we write our sequence from column A of the source file
i = 0
for rec in vals:
    ws.write(i,1,rec[0])
    i =+ i

#save workbook
wb.save('../ArticleScripts/ExcelPython/xl_rec.xls')

As you can see from the example above, the xlrd library is responsible for reading data, while xlwt is responsible for writing, so there is no way to make changes to an already created book without copying it to a new one. Besides that, these libraries only work with xls format files (Excel 2003) and don't support new xlsx format (Excel 2007 and higher).

You will need openpyxl library to work successfully with xlsx format. To demonstrate how it works, let's follow the steps shown for previous libraries.

Start by loading the library and selecting the desired book and worksheet


import openpyxl
wb = openpyxl.load_workbook(filename = '../ArticleScripts/ExcelPython/openpyxl.xlsx')
sheet = wb['test']

As you can see from the listing above, this is easy to do. Now let's see how we can read the data:


#read the value of a specific cell
val = sheet['A1'].value

#read the defined range
vals = [v[0].value for v in sheet.range('A1:A2')]

The difference from previous libraries is that openpyxl allows us to map to cells and sequences via their names, which is quite convenient and easy to understand when reading the program.

Now let's see how we can write and save the data:


#Write a value in a specific cell
sheet['B1'] = val

# we write the sequence
i = 0
for rec in vals:
    sheet.cell(row=i, column=2).value = rec
    i =+ 1

# save data
wb.save('../ArticleScripts/ExcelPython/openpyxl.xlsx')

From the example above, you can see that saving is pretty easy, too. Moreover, in the above code, you can see that openpyxl is able to work with cell names and indexes.

The disadvantage of this library is that, as in the previous example, it is not possible to save changes without creating a new book.

As shown above, for more or less full-fledged work with excel files, in this case, you need 4 libraries, and it is not always convenient. Besides, you may need access to VBA (let's say for some further processing) and you can't get it with the help of these libraries.

However, working with these libraries is quite easy and convenient for quick creation of Excel files and their formatting, but if you need more possibilities, the next sub-item is for you.

Copy data from one Excel sheet to another

how to copy data from one Excel sheet to a target Excel workbook using the openpyxl module in Python.

To work with Excel files we need openpyxl which is a Python library used to read, write and modify Excel files (with extension xlsx / xlsm / xltx / xltm). It can be installed with the following command:

Sudo pip3 install openpyxl

To copy one Excel file to another we first open both source and target Excel files. We then calculate the total number of rows and columns in the source Excel file, read the value of one cell and store it in a variable, and then write that value to the destination Excel file at the same cell position as the cell position in the source file. The final file will be saved.

Algorithm: openpyxl copy worksheet to another workbook

  • 1) Import the openpyxl library as xl.
  • 2) Open the original Excel file using the path where it is located.
  • Note: The path must be a string and contain a double backslash (\\) instead of a single backslash (\). For example: The path must be C:\\Users\\Desktop\\source.xlsx instead of C:\Users\Admin\Desktop\source.xlsx
  • 3) Open the required worksheet for copying, using its index. The 'n-th' worksheet index equals n-1.
  • 4) Open the target Excel file and the active worksheet in it.
  • 5) Calculate the total number of rows and columns in the source Excel file.
  • 6) Use two for loops (one to go through the rows and one to go through the columns of the Excel file) to read the value of a cell in the source file into a variable and then write it to a cell in the target file from that variable.
  • 7) Save the target file.

# importing openpyxl module
import openpyxl as xl;

# opening the source excel file
filename ="C:\\Users\\\\Admin\\\\\Desktop\\trading.xlsx"
wb1 = xl.load_workbook(filename)
ws1 = wb1.worksheets[0]

# opening the destination excel file
filename1 ="C:{\Users\\\\Admin\\\\Desktop\\test.xlsx"
wb2 = xl.load_workbook(filename1)
ws2 = wb2.active

# calculate total number of rows and
# columns in source excel file
mr = ws1.max_row
mc = ws1.max_column

# copying the cell values from source
# excel file to destination excel file
for i in range (1, mr + 1):
for j in range (1, mc + 1):
 # reading cell values from source excel file
 c = ws1.cell(row = i, column = j)

 # writing the read value to destination excel file
 ws2.cell(row = i, column = j).value = c.value

# saving the destination excel file
wb2.save(str(filename1))

Working with a com-object

In my reports, I prefer to use the second method, which is to use an Excel file via a com-object using the win32com library. The advantage of this method is that you can perform all the operations with the file that regular Excel can do using VBA.

Calling Python functions from MS Excel

You may have a situation where you already have a function that processes data in python and you want to transfer its functionality to Excel. Of course, you can rewrite it in VBA, but why?

Excel has an excellent add-in for using python functions in Excel. With it you can call python functions directly from Excel.

So, let's assume we have a function written in python that we want to use:


def get_unique(lists)
 sm = 0
for i in lists:
 sm = sm + int(i.pop())
return sm

It takes a list of lists as input, this is one of the conditions that must be fulfilled for this function to work in Excel.

Let's save the function in plugin.py and put it in the same directory where our excel file we are going to work with will be located.

Now let's install ExcelPython. The installation is done by running the exe file and is easy to do.

When all the preparations are done, open the test excel file and call the VBA editor (Alt+F11). In order to work with the above mentioned add-in it is necessary to connect it, via Tools->References.

Well, now we can start writing a wrapper function for our Python plugin.py. It looks like this:


Function sr(lists As Range)     On Error GoTo do_error         Set plugin = PyModule("plugin", AddPath:=ThisWorkbook.Path)         Set result = PyCall(plugin, "get_unique", PyTuple(lists.Value2))         sr = WorksheetFunction.Transpose(PyVar(result))         Exit Function do_error:         sr = Err.Description End Function

So what happens in this function?

First, using PyModule, we connect the required module. To do this, we give the module's name without an extension and the path to the folder it is located in as a parameter to the function. In the output of PyModule we get an object to work with the module.

Then, using PyCall, we call the desired function from the specified module. PyCall gets the following as parameters:

  • The module object obtained in the previous step
  • The name of the called function
  • Parameters passed to the function (passed as a list)

The PyTuple function, receives some values as input and converts them into a Python tuple object.
Well, accordingly, PyVar performs an operation to convert the result of the python function, to a type comprehensible to Excel.

Now, to make sure our combination works, let's call our freshly baked function on an Excel sheet.

Let's illustrate this with the same task as the previous examples. First, let's load the required library and create a COM object.

import win32com.client
Excel = win32com.client.Dispatch("Excel.Application")

Now we can work with Excel object we can access all the features of VBA. Let's start by opening any book and select the active sheet. This can be done like this:


wb = Excel.Workbooks.Open(u'D:\\\Scripts\\\\DataScience\\\ArticleScripts\\\ExcelPython\\xl.xls')
sheet = wb.ActiveSheet

Let's get the value of the first cell and the sequence:


#Get the value of the first cell
val = sheet.Cells(1,1).value

#gets the value of A1:A2
vals = [r[0].value for r in sheet.Range("A1:A2")]

As you can see, we are operating with pure VBA functions here. This is very handy if you have macros written and want to use them when working with Python with a minimum of code rework.

Let's see how to write the resulting value:


#Write a value in a specific cell
sheet.Cells(1,2).value = val

#write the sequence
i = 1
for rec in vals:
    sheet.Cells(i,3).value = rec
    i = i + 1

#save workbook
wb.Save()

#close it
wb.Close()

#close COM object
Excel.Quit()

You can see from the example that these operations are also quite simple to implement. In addition, you can notice that we have saved the changes in the same book that we opened for reading, which is quite convenient.

However, the attentive reader will pay attention to the variable i, which is initialized not with 0, as is usual in python, but with 1. This is due to the fact that we work with cell indices as from VBA, and there numbering begins with 1, not 0.

This concludes our discussion of how to work with excel files in python and moves on to the reverse task.

As you can see from the picture, everything worked correctly.

It should be noted that this material uses an older version of ExcelPython and a newer version is available on the author's GitHub.

Conclusion

As a conclusion, I would like to point out that the examples in this article are very simple and I recommend that you take a look at the documentation of the required packages for a deeper understanding of these methods.
documentation for the required packages.

I would also like to note, that the mentioned packages are not the only ones, and the article omits consideration of such packages as xlsxwriter for generating excel files or xlwings, which can work with Excel files "on the fly", and PyXLL, which performs the same ExcelPython functions.

Besides, I've tried to summarise some of the material on the web, because such questions often appear on forums, and I think it would be useful for some people to have such a "cheat sheet" at hand.

"