This is a short note about reading tables from common office document formats into R.
xls and xlsx
I have had good success with the XLConnect package, both with the old xls format and the XML based xlsx format. The key function for me is readWorksheet.
require(XLConnect) wb <- loadWorkbook("my_file.xls") s1 <- readWorksheet(wb, sheet = "Sheet 1", region = "A3:C7", header = TRUE)
Recently I had to save a doc file in the docx format in order to be able to extract a table. Extracting tables from docx works like this:
require(docxtractr) docx <- docxtractr::read_docx("my_file.docx") tables <- docx_extract_all_tbls(docx)