Decide on file structure and naming #2

sethwoodworth · 2015-03-30T15:30:01Z

See @rdhyee's questions on Dubliners PR #1:
GITenberg/Dubliners_2814#1

samwilson · 2015-03-30T23:06:49Z

I think keeping the original files is a good idea (perhaps in the old directory) especially for the purpose of re-importing the works from PG (or wherever they might come in the future). Of course, they're always available in the Git history, but it's easier to do a diff on a file that's right there, and at the same time bring the AsciiDoc version up to date (which can be a manual operation, considering that the changes are unlikely to ever be very big).

I don't think the intermediate files need to be kept though (the steps from original .txt to encoded, to unicoding, to sectioning, etc.). Just the final .asciidoc one (and I reckon that's a better file extension than the sometimes-used .asc; not that I've seen the latter used in Gitenberg).

sethwoodworth · 2015-04-02T18:09:15Z

I agree we shouldn't keep the new intermediate files. And I agree we should keep the original text file. Maybe we should specify all of the available source file types in metadata.yml?

eshellman · 2015-04-02T19:39:27Z

I was thinking there should be a "gitenberg-status" field in the metadata, but please no list of files.

On Apr 2, 2015, at 2:09 PM, Seth Woodworth [email protected] wrote:

I agree we shouldn't keep the new intermediate files. And I agree we should keep the original text file. Maybe we should specify all of the available source file types in metadata.yml?

—
Reply to this email directly or view it on GitHub #2 (comment).

samwilson · 2015-04-05T00:08:47Z

So, the overall structure could look something like:

metadata.yml
1234.asciidoc
1234.html
README
LICENSE
CONTRIBUTING
images/
    cover.png
    frontispiece.png
    ...
old/
    1234.txt
    1234.html
    ...

Where old/ contains the original PG text and any other out of date files. (The multiple files at the top level would be where there's manually-produced other formats.)

PG handle file modifications by creating copies of files... should we do the same here? I know we don't need to because we've got Git storing the history, but it might make things more transparent and perhaps make pushing things back upstream easier? I'm not sure.

sethwoodworth · 2015-04-05T02:35:13Z

One modification of how things currently are,
PG html files are stored at: 1234-h/1234.html.
This 1234-h folder optionally contains an images/ folder.

An html file generated by us from 1234.asciidoc isn't currently planned to be checked into the repo. With GH tags, we can create a Release, and attach arbitrary files to that release for download (html, epub, mobi). With the Travis-ci api, we may be able to add these automatically back to github. Worst case, we can push them out to a small server that adds compiled outputs to the release tag.

eshellman · 2015-04-05T14:12:37Z

I think that 1234.html and 1234.* is a mistake.

they should get generic names like content.html

reasons:

the ebook builder should be usable before gutenberg ID assignment
the ebook builder should be usable for works outside of gutenberg
using the id in the file name adds complexity to the software interfaces

samwilson · 2015-04-06T07:05:01Z

Good point about more generic filenames, but it does raise the point about dependence on PG. My understanding has been that Gitenberg is a sort of subproject of PG and so it's okay to rely on their identifiers. Works that are not yet in PG should be submitted there, be given an identifier, and then imported here. Maybe? I'm not sure!

On 5 April 2015 22:12:37 GMT+08:00, eshellman [email protected] wrote:

I think that 1234.html and 1234.* is a mistake.

they should get generic names like content.html

reasons:

the ebook builder should be usable before gutenberg ID assignment

the ebook builder should be usable for works outside of gutenberg

using the id in the file name adds complexity to the software
interfaces

Reply to this email directly or view it on GitHub:
#2 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

eshellman · 2015-04-06T12:36:10Z

Gitenberg is more of an independent research project.

Most of the texts in PG come from Distributed Proofreaders, which has elaborate process controls to ensure that PG ebooks can be produced from the html files they emit. Gitenberg has a stronger need to integrate with DP processes, if successful, than any need to use pg ids as filenames, which are assigned only after the finished files are accepted into PG.

Here are the DP initial processes, from http://www.pgdp.net/wiki/Guiguts_PP_Process_Checklist

Go to Project page
- Read details and requirements.
- bookmark the project URL and note project ID number.
- read the project forum page, note any issues proofers raised.
Make a project folder, e.g. (Win) C:\dp\pp[bookname] or (Mac/Linux) /dp/pp/[bookname]
Download the text and images files and unpack in new folder:
- Text to [bookname].txt.
- page images (nnn.png) in subfolder pngs
- hi-res illustration scans (imagenn.png) in subfolder originals
- empty subfolder images.

So the key thing here is that a bookname is selected and used as the directory name. If you wanted to continue the practice of also using [bookname] to name text files, you'd need to come up with a reserved list of all the filenames that bookname couldn't be. easier to use generic names, I think.

sethwoodworth added the question label Mar 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide on file structure and naming #2

Decide on file structure and naming #2

sethwoodworth commented Mar 30, 2015

samwilson commented Mar 30, 2015

sethwoodworth commented Apr 2, 2015

eshellman commented Apr 2, 2015

samwilson commented Apr 5, 2015

sethwoodworth commented Apr 5, 2015

eshellman commented Apr 5, 2015

samwilson commented Apr 6, 2015

eshellman commented Apr 6, 2015

Decide on file structure and naming #2

Decide on file structure and naming #2

Comments

sethwoodworth commented Mar 30, 2015

samwilson commented Mar 30, 2015

sethwoodworth commented Apr 2, 2015

eshellman commented Apr 2, 2015

samwilson commented Apr 5, 2015

sethwoodworth commented Apr 5, 2015

eshellman commented Apr 5, 2015

samwilson commented Apr 6, 2015

eshellman commented Apr 6, 2015