The [BBC used to publish their schedules][tarball] in a great big XML tarball. They did this daily and it contained the broadcasts for the next seven days. I wrote a script that created [iCalendar][ical] files out of this for each channel.
But a couple weeks ago they [stopped][old] publishing the schedules in this format. In its place they provide [schedules for each channel or by genre or by series][new], and they also provide other formats in addition to XML.
One great advantage of the new data is that it is also available in iCalendar format, which _almost_ makes my script redundant were it not for the fact there is no weekly schedule for each channel.
[I re-wrote my script][guide] to use the newer schedule data and I got opinions!
1. Only daily schedules available for channels.
You can ask for today’s schedule, for tomorrow’s schedule, for a specific date’s schedule. You cannot ask for a week’s schedule (which is pretty much all I want) or for anything other than a day’s schedule.
However for a particular programme (but not for a channel) you can get the upcoming schedule.
2. A programme title consisting only of numbers is encoded as an integer.
I came across [a broadcast where the programme title was “606”][606] but the title appeared in the JSON data as `”title”: 606`. The value of the title should always be a string, as should other textual values.
3. Post-midnight broadcasts appear in the schedule for two days.
This is a side-effect of the fact I need to look at several consecutive days’ schedules in order to combine them into a week’s schedule. If you look at today’s schedule, programmes that are broadcast after midnight are included. If you look at tomorrow’s schedule, programmes that start immediately after midnight are included, even though they also appeared on today’s schedule.
This behaviour makes perfect sense when the schedules are consumed by a real person. If I look up today’s schedule I _want_ to see those transmissions that happen early the next morning, even though they are strictly not part of today’s schedule.
But for consuming the data programmatically, having the overlap means you have to deal with duplicate events. In my case I dealt with this by converting transmission events to tuples which are then easy to compare and you can weed out the duplicates by throwing them all [into a Python set][set].
There is a conflict between the notion of a calendar day and a transmission day. For most BBC channels the day begins and ends sometime between 1 o’clock and 5 o’clock in the morning. For some channels it isn’t obvious that one day’s programming has stopped and another has begun because there may be no dead air where nothing is transmitted, but think back to the good old days where all you got in the middle of the night was the white dot to let you know the telly was still working.
My re-written script pulls the schedule using the JSON data feeds, but I wonder if things would be easier if it just concatenated the events that are published in the daily iCalendar files? I guess we’ll never ever know.
[tarball]: http://backstage.bbc.co.uk/data/7DayListingData?v=16wk
[ical]: http://en.wikipedia.org/wiki/ICalendar
[new]: http://www.bbc.co.uk/programmes/developers
[old]: http://www0.rdthdo.bbc.co.uk/services/api/
[606]: http://www.bbc.co.uk/5live/programmes/schedules/2010/10/02
[guide]: http://reliablybroken.com/guide/
[set]: http://docs.python.org/library/stdtypes.html#set-types-set-frozenset