Split a file on any character in Python

I need to split a big text file on a certain character. I expect I am being thick about this, but [`split`][split] doesn’t quite do what I want because it includes the matching line, whereas I want to split right on the matching character.

My Python answer:

def readlines(filename, endings, chunksize=4096):
“””Returns a generator that splits on lines in a file with the given
line-ending.
“””
line = ”
while True:
buf = filename.read(chunksize)
if not buf:
yield line
break

line = line + buf

while endings in line:
idx = line.index(endings) + len(endings)
yield line[:idx]
line = line[idx:]

if __name__ == “__main__”:
import sys, os

FORMFEED = chr(12) # ASCII 12
basename = os.path.basename(sys.argv[1])
for num, data in enumerate(readlines(open(sys.argv[1]), endings=FORMFEED)):
filename = basename + ‘-‘ + str(num)
open(filename, ‘wb’).write(data)

This is also useful when reading data exported from some old-fashioned Mac application like [Filemaker 5][filemaker] where the line-endings are ASCII 13 not ASCII 10.

This post was inspired by [Lotus Notes][lotus] version 8.5, which is so advanced that to save a message in a file on disk you have to export it as structured text. And if you want to save a whole bunch of messages as individual files you must forget that [drag-and-drop was introduced with System 7][mactech], that would be too obvious.

[filemaker]: http://www.filemaker.com/support/downloads/downloads_prev_versions.html
[split]: http://developer.apple.com/Mac/library/documentation/Darwin/Reference/ManPages/man1/split.1.html
[lotus]: http://www-01.ibm.com/software/lotus/products/notes/
[mactech]: http://www.mactech.com/articles/mactech/Vol.10/10.06/DragAndDrop/index.html

6 thoughts on “Split a file on any character in Python

  1. Claes Wallin

    Haha! Quite a surprise to find that the first search result on splitting files on a character wanted to split Lotus Notes structured text files, just like me! I like your solution; Very elegant, very pythonic.

  2. Claes Wallin

    Thank for the offer, but my immediate problem is already solved. Just needed to diff some documents, and couldn’t be bothered to set up the “delta of 2 documents” menu hack. :-)

  3. david Post author

    Rather chuffed that anyone would describe my efforts as Pythonic! What is the “delta of 2 documents menu hack”? I have a morbid fascination with Lotus Notes now that I no longer use it in anger.

  4. Claes Wallin

    Hehe, so now I have learned my New English Word Of The Day, thanks to you and the online Merriam-Webster. http://www.merriam-webster.com/dictionary/chuffed

    There is built-in UI functionality in Lotus Notes to compare two documents, it’s in one of the included DLLs. However, it is not accessible from the UI unless you add a line in the .ini file defining a menu item to call the thing. Very weird, very typical Lotus Notes.

Leave a Reply

Your email address will not be published. Required fields are marked *