In reply to remus:
> Do you think rockfax will convert to being digital first at some point? i.e. will the database + interactive topos become the reference rather than the books? From my totally naive "wouldn't it be easier if you just..." point of view, I would have thought going from a structured database in to a book format would have be easier than doing things the other way round.
We've thought a lot about this. I'd love to remove some of the complexity, and this idea seems at first blush like it could do that. But every time I think through the implications, I come to the conclusion that it just wouldn't work that well for us. There's a few reasons for this:
A very large portion of the work in making a book is the layout. This can't be automated. So a DB-first system would, at best, spit out a load of topos and route descriptions that would then need laying out by a human. As soon as you have this half-automated system you end up with two "sources of truth". Imagine the situation where you've done the initial DB to InDesign export, then spent months working on the book, then you get corrections for some routes. You now have to put them into two places. If the DB-export was 100% automated you'd just put them in the DB and press the "generate book" button again, but it wouldn't be. We do have that the other way around though already; any changes to the book information can be extracted and inserted into the database in a few clicks.
Another big one is that we're beginning to act as a platform for third parties to publish their guides on (the SMC is the first publisher to get on board). Any back catalogue is going to be InDesign based, so a DB-first setup won't help there.
In the end, we've already got the system we've got, so making a new one that doesn't totally replace it, and remove all of the problems we face would just add to the size and complexity of it all.
> I'd also be interested to understand how you deal with changes in formatting between different books. I imagine the layout in the books has evolved through time, so is it tricky accounting for those changes when you're extracting the data?
This was a big deal in the beginning. We had all these books we wanted to put into rfdigital, some made that year and some made eight years before, so the variation was massive. And the variation in consistency of conventions was massive, that was the big problem. It was dealt with either by adding flexibility to the parser (which uses contextual information to do with the current page or object in InDesign and a collection of regular-expression patterns for breaking apart text), or sometimes by admitting that not everything needs to be automated and just fixing the broken bits in InDesign when I came across them.
Often in old books, somethings will have been done differently to now, but consistently at least, so it's way easier to write a throwaway script for InDesign to globally change these things than it is to add more rules to the parsing engine to account for them.