Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NLP-Based Data Extraction System for ORCA Log Files #99

Open
SiriChandanaGarimella opened this issue Nov 11, 2024 · 0 comments
Assignees

Comments

@SiriChandanaGarimella
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Based on the analysis from Issue #98, we need to implement an NLP-based system for extracting data from ORCA log files. The current rule-based system needs to be replaced with a more robust solution that can maintain accuracy, scale easily and efficiently, and reduce maintenance.

Describe the solution you'd like
Implement a Python-based extraction system using SpaCy and sci-kit-learn to extract search terms and other sections from ORCA files. The system should handle multiple sections and maintain data structure integrity with proper error handling. Use a hybrid approach if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant