I need to split a big text file on a certain character. I expect I am being thick about this, but [`split`][split] doesn’t quite do what I want because it includes the matching line, whereas I want to split right on the matching character.
My Python answer:
def readlines(filename, endings, chunksize=4096):
“””Returns a generator that splits on lines in a file with the given
line-ending.
“””
line = ”
while True:
buf = filename.read(chunksize)
if not buf:
yield line
break
line = line + buf
while endings in line:
idx = line.index(endings) + len(endings)
yield line[:idx]
line = line[idx:]
if __name__ == “__main__”:
import sys, os
FORMFEED = chr(12) # ASCII 12
basename = os.path.basename(sys.argv[1])
for num, data in enumerate(readlines(open(sys.argv[1]), endings=FORMFEED)):
filename = basename + ‘-‘ + str(num)
open(filename, ‘wb’).write(data)
This is also useful when reading data exported from some old-fashioned Mac application like [Filemaker 5][filemaker] where the line-endings are ASCII 13 not ASCII 10.
This post was inspired by [Lotus Notes][lotus] version 8.5, which is so advanced that to save a message in a file on disk you have to export it as structured text. And if you want to save a whole bunch of messages as individual files you must forget that [drag-and-drop was introduced with System 7][mactech], that would be too obvious.
[filemaker]: http://www.filemaker.com/support/downloads/downloads_prev_versions.html
[split]: http://developer.apple.com/Mac/library/documentation/Darwin/Reference/ManPages/man1/split.1.html
[lotus]: http://www-01.ibm.com/software/lotus/products/notes/
[mactech]: http://www.mactech.com/articles/mactech/Vol.10/10.06/DragAndDrop/index.html
Haha! Quite a surprise to find that the first search result on splitting files on a character wanted to split Lotus Notes structured text files, just like me! I like your solution; Very elegant, very pythonic.
The complete script parsed the message headers to create filenames containing the date and subject. E-mail me if you want it, [email protected]
Thank for the offer, but my immediate problem is already solved. Just needed to diff some documents, and couldn’t be bothered to set up the “delta of 2 documents” menu hack. :-)
Rather chuffed that anyone would describe my efforts as Pythonic! What is the “delta of 2 documents menu hack”? I have a morbid fascination with Lotus Notes now that I no longer use it in anger.
Hehe, so now I have learned my New English Word Of The Day, thanks to you and the online Merriam-Webster. http://www.merriam-webster.com/dictionary/chuffed
There is built-in UI functionality in Lotus Notes to compare two documents, it’s in one of the included DLLs. However, it is not accessible from the UI unless you add a line in the .ini file defining a menu item to call the thing. Very weird, very typical Lotus Notes.
thanks! Also needed a script to extract my mail messages from Lotus…