Working with old databases means working with truly odd data dumps.One of the strangest I ever got was from a database of more than one million historical art auctions dating back to the seventeenth century.

I was sent one text file of almost 10 million lines that looked like this: --RECORD NUMBER--1 BBF Number 2983 Edit Status Unedited Lot Sale Date 1827/06/01 Sale Begin Date 1827/06/01 Auct. Stanley (G.) Lot Number 0068 Verbatim Artist Paolo Veronese Verbatim Artist Carlo Veronese Verbatim Artist Gabriel Veronese Verbatim Artist Benito Veronese Title The Baptism of Christ by St. Verbatim Seller Count Altamira Transaction Sold Price 52.10 |c £ Verbatim Buyer Hume --RECORD NUMBER--2 BBF Number 14756 ... This reads the entire file as a character vector, with one line per slot in the vector.

I'll wrap this vector inside a dataframe to essentially create a one-column table that we can then manipulate with dplyr and tidyr.

