Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on file structure and naming #2

Open
sethwoodworth opened this issue Mar 30, 2015 · 8 comments
Open

Decide on file structure and naming #2

sethwoodworth opened this issue Mar 30, 2015 · 8 comments
Labels

Comments

@sethwoodworth
Copy link
Contributor

See @rdhyee's questions on Dubliners PR #1:
GITenberg/Dubliners_2814#1

@samwilson
Copy link
Contributor

I think keeping the original files is a good idea (perhaps in the old directory) especially for the purpose of re-importing the works from PG (or wherever they might come in the future). Of course, they're always available in the Git history, but it's easier to do a diff on a file that's right there, and at the same time bring the AsciiDoc version up to date (which can be a manual operation, considering that the changes are unlikely to ever be very big).

I don't think the intermediate files need to be kept though (the steps from original .txt to encoded, to unicoding, to sectioning, etc.). Just the final .asciidoc one (and I reckon that's a better file extension than the sometimes-used .asc; not that I've seen the latter used in Gitenberg).

@sethwoodworth
Copy link
Contributor Author

I agree we shouldn't keep the new intermediate files. And I agree we should keep the original text file. Maybe we should specify all of the available source file types in metadata.yml?

@eshellman
Copy link
Contributor

I was thinking there should be a "gitenberg-status" field in the metadata, but please no list of files.

On Apr 2, 2015, at 2:09 PM, Seth Woodworth [email protected] wrote:

I agree we shouldn't keep the new intermediate files. And I agree we should keep the original text file. Maybe we should specify all of the available source file types in metadata.yml?


Reply to this email directly or view it on GitHub #2 (comment).

@samwilson
Copy link
Contributor

So, the overall structure could look something like:

metadata.yml
1234.asciidoc
1234.html
README
LICENSE
CONTRIBUTING
images/
    cover.png
    frontispiece.png
    ...
old/
    1234.txt
    1234.html
    ...

Where old/ contains the original PG text and any other out of date files. (The multiple files at the top level would be where there's manually-produced other formats.)

PG handle file modifications by creating copies of files... should we do the same here? I know we don't need to because we've got Git storing the history, but it might make things more transparent and perhaps make pushing things back upstream easier? I'm not sure.

@sethwoodworth
Copy link
Contributor Author

One modification of how things currently are,
PG html files are stored at: 1234-h/1234.html.
This 1234-h folder optionally contains an images/ folder.

An html file generated by us from 1234.asciidoc isn't currently planned to be checked into the repo. With GH tags, we can create a Release, and attach arbitrary files to that release for download (html, epub, mobi). With the Travis-ci api, we may be able to add these automatically back to github. Worst case, we can push them out to a small server that adds compiled outputs to the release tag.

@eshellman
Copy link
Contributor

I think that 1234.html and 1234.* is a mistake.

they should get generic names like content.html

reasons:

  • the ebook builder should be usable before gutenberg ID assignment
  • the ebook builder should be usable for works outside of gutenberg
  • using the id in the file name adds complexity to the software interfaces

@samwilson
Copy link
Contributor

Good point about more generic filenames, but it does raise the point about dependence on PG. My understanding has been that Gitenberg is a sort of subproject of PG and so it's okay to rely on their identifiers. Works that are not yet in PG should be submitted there, be given an identifier, and then imported here. Maybe? I'm not sure!

On 5 April 2015 22:12:37 GMT+08:00, eshellman [email protected] wrote:

I think that 1234.html and 1234.* is a mistake.

they should get generic names like content.html

reasons:

  • the ebook builder should be usable before gutenberg ID assignment
  • the ebook builder should be usable for works outside of gutenberg
  • using the id in the file name adds complexity to the software
    interfaces

Reply to this email directly or view it on GitHub:
#2 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

@eshellman
Copy link
Contributor

Gitenberg is more of an independent research project.

Most of the texts in PG come from Distributed Proofreaders, which has elaborate process controls to ensure that PG ebooks can be produced from the html files they emit. Gitenberg has a stronger need to integrate with DP processes, if successful, than any need to use pg ids as filenames, which are assigned only after the finished files are accepted into PG.

Here are the DP initial processes, from http://www.pgdp.net/wiki/Guiguts_PP_Process_Checklist

  • Go to Project page
    • Read details and requirements.
    • bookmark the project URL and note project ID number.
    • read the project forum page, note any issues proofers raised.
  • Make a project folder, e.g. (Win) C:\dp\pp[bookname] or (Mac/Linux) /dp/pp/[bookname]
  • Download the text and images files and unpack in new folder:
    • Text to [bookname].txt.
    • page images (nnn.png) in subfolder pngs
    • hi-res illustration scans (imagenn.png) in subfolder originals
    • empty subfolder images.

So the key thing here is that a bookname is selected and used as the directory name. If you wanted to continue the practice of also using [bookname] to name text files, you'd need to come up with a reserved list of all the filenames that bookname couldn't be. easier to use generic names, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants