-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dfxp: examine the contents of div tags for timing information (cbs) #2
Conversation
…od-module into fix/dfxp-timecode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, although admittedly i lack the domain knowledge to judge the logic itself. It seems right... ;-). Fwiw, I think the "style" changes are all pretty small and easy to implement, so maybe makes sense to have them done now rather than later. Can't really comment on the "efficiency" remarks, i trust Andre's notes in the response. The "bug" scenario does seem legit.. should we fix it perhaps by keeping track of div hierarchy in a stack and pushing/popping elements there as the parser encounters div start- and end-tags?
@juce I was thinking we could mark the "level" at which |
This pull request has been superseded because the changes were merged out-of-band into our fork's master branch. This branch's code is simpler, the other code that was merged conforms better to the coding style of the vod module, but should be equivalent. This branch is also missing a bug-fix. Please reopen if you disagree. |
Original Pull Request: kaltura#1136 (comment)
The current dfxp parser does not examine the contents of <div tags for timing information. This causes it to generate an empty manifest for some valid dfxp files. It is valid to have timing information in the <div, and I can provide more details and examples on request, but here are two files:
FILE A: http://flv.io/ABC142908_id.dfxp.xml
FILE B: http://flv.io/ABC142908_en-US.dfxp.xml
FILE B currently generates an empty manifest. This change fixes this.
I have made changes to two functions that examine timing information:
dfxp_get_duration, which traverses the entire XML to compute the ultimate duration of the subtitle's content.
dfxp_parse_frames, which traverses the XML in a similar fashion. It now looks at <p tags and reverts to the last-visited <div if there exists no timing information in the <p tag.
Additionally, there is set of helper functions to remove duplicate code, as reconciling my changes to work inline resulted in cruft. The resulting changes allow FILE B to be parsed, as well as combinations of FILE A and FILE B.
Additional Feedback: kaltura#1136 (comment)
We have a few things to do in this branch based on the external PR feedback, but I believe they are not business-critical. Let's review this internally and merge it into our copy of the repo, and create another task to handle the changes suggested by the reviewer.
Good to have to get the PR merged and keep the code consistent, but not necessary for things to work.
These are nits, the routines do most of their work looking for timecodes in irrelevant places, for example, parsing head. We will not get much value out of these optimizations, because benchmarks on the corpus of our subtitle files before and after the patch have similar runtimes.
I'm glad the reviewer pointed this out. I think the only time this could happen is if the user submits a file that isn't going to work anyway:
It could be nice to fix, but it's an edge-case, so I recommend a separate, lower-priority task for these modifications.