-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enhance DegradationAnalysis to support question-answering task #1153
feat: enhance DegradationAnalysis to support question-answering task #1153
Conversation
… and add evaluation method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
…ground truth in DegradationAnalysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
langtest/transform/accuracy.py:1337
- This line assumes that 'values' will always have at least one element. This could potentially raise an 'IndexError' if 'values' is empty. Consider adding a check to ensure 'values' is not empty before accessing the first element.
ground_truth = X_test[X_test.index == index]["expected_results"].values[0]
…s accuracy calculation
…uestion-answering task evaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (3)
langtest/transform/accuracy.py:1190
- [nitpick] The variable name 'x' is ambiguous. It should be renamed to 'sample' for clarity.
x.original_context if x.original_context else "" for x in X_test
langtest/transform/accuracy.py:1192
- [nitpick] The variable name 'x' is ambiguous. It should be renamed to 'sample' for clarity.
x.original_question for x in X_test
langtest/transform/accuracy.py:1193
- [nitpick] The variable name 'x' is ambiguous. It should be renamed to 'sample' for clarity.
x.expected_results for x in X_test
…curacy calculations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
langtest/transform/accuracy.py:1186
- [nitpick] The variable name X_test is used for both a list of QASample and a DataFrame. Consider renaming the DataFrame variable to avoid ambiguity.
if len(X_test) and isinstance(X_test[0], QASample):
langtest/transform/accuracy.py:1231
- The new qa_evaluation method is introduced but there is no indication that this new functionality is covered by tests. Ensure that there are tests for this method.
accuracy_score1, accuracy_score2 = DegradationAnalysis.qa_evaluation(samples, X_test)
Co-authored-by: Copilot <[email protected]>
This pull request introduces significant enhancements to the
langtest/transform/accuracy.py
file, primarily adding support for question-answering tasks. The most important changes include importing theQASample
class, updating theDegradationAnalysis
class to support question-answering tasks, and adding a new method for evaluating question-answering tasks.Support for question-answering tasks:
langtest/transform/accuracy.py
: ImportedQASample
fromlangtest.utils.custom_types.sample
.class DegradationAnalysis(BaseAccuracy)
: Added "question-answering" to thesupported_tasks
list.Enhancements to
run
method:async def run(
inlangtest/transform/accuracy.py
: Added logic to handleQASample
instances and convert them into a DataFrame for processing. [1] [2]New method for question-answering evaluation:
def qa_evaluation(self, samples: List[QASample], X_test: pd.DataFrame)
: Added a new method to evaluate model performance on question-answering tasks and return accuracy scores for original and perturbed samples.Improvements to
show_results
method:def show_results()
: Adjusted the bar plot rendering to ensure the "before" bars are drawn behind the "after" bars and dynamically adjusted label positions for better clarity.