feat: enhance DegradationAnalysis to support question-answering task #1153

chakravarthik27 · 2024-12-16T16:39:50Z

This pull request introduces significant enhancements to the langtest/transform/accuracy.py file, primarily adding support for question-answering tasks. The most important changes include importing the QASample class, updating the DegradationAnalysis class to support question-answering tasks, and adding a new method for evaluating question-answering tasks.

Support for question-answering tasks:

langtest/transform/accuracy.py: Imported QASample from langtest.utils.custom_types.sample.
class DegradationAnalysis(BaseAccuracy): Added "question-answering" to the supported_tasks list.

Enhancements to run method:

async def run( in langtest/transform/accuracy.py: Added logic to handle QASample instances and convert them into a DataFrame for processing. [1] [2]

New method for question-answering evaluation:

def qa_evaluation(self, samples: List[QASample], X_test: pd.DataFrame): Added a new method to evaluate model performance on question-answering tasks and return accuracy scores for original and perturbed samples.

Improvements to show_results method:

def show_results(): Adjusted the bar plot rendering to ensure the "before" bars are drawn behind the "after" bars and dynamically adjusted label positions for better clarity.

… and add evaluation method

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

langtest/transform/accuracy.py

…racy calculation

…ground truth in DegradationAnalysis

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

langtest/transform/accuracy.py:1337

This line assumes that 'values' will always have at least one element. This could potentially raise an 'IndexError' if 'values' is empty. Consider adding a check to ensure 'values' is not empty before accessing the first element.

ground_truth = X_test[X_test.index == index]["expected_results"].values[0]

…s accuracy calculation

…uestion-answering task evaluation

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

langtest/transform/accuracy.py:1190

[nitpick] The variable name 'x' is ambiguous. It should be renamed to 'sample' for clarity.

x.original_context if x.original_context else "" for x in X_test

langtest/transform/accuracy.py:1192

[nitpick] The variable name 'x' is ambiguous. It should be renamed to 'sample' for clarity.

x.original_question for x in X_test

langtest/transform/accuracy.py:1193

[nitpick] The variable name 'x' is ambiguous. It should be renamed to 'sample' for clarity.

x.expected_results for x in X_test

…curacy calculations

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

langtest/transform/accuracy.py:1186

[nitpick] The variable name X_test is used for both a list of QASample and a DataFrame. Consider renaming the DataFrame variable to avoid ambiguity.

if len(X_test) and isinstance(X_test[0], QASample):

langtest/transform/accuracy.py:1231

The new qa_evaluation method is introduced but there is no indication that this new functionality is covered by tests. Ensure that there are tests for this method.

accuracy_score1, accuracy_score2 = DegradationAnalysis.qa_evaluation(samples, X_test)

langtest/transform/accuracy.py

Co-authored-by: Copilot <[email protected]>

feat: enhance DegradationAnalysis to support question-answering tasks…

652d688

… and add evaluation method

chakravarthik27 requested a review from Copilot December 16, 2024 16:39

chakravarthik27 self-assigned this Dec 16, 2024

chakravarthik27 linked an issue Dec 16, 2024 that may be closed by this pull request

Support for QA task in Degradation_analysis Test #1152

Closed

Copilot AI reviewed Dec 16, 2024

View reviewed changes

langtest/transform/accuracy.py Outdated Show resolved Hide resolved

chakravarthik27 added 2 commits December 16, 2024 22:13

feat: skip samples with None ground truth in DegradationAnalysis accu…

5921cf1

…racy calculation

fix: correctly decrement total count when skipping samples with None …

cc46917

…ground truth in DegradationAnalysis

chakravarthik27 requested a review from Copilot December 16, 2024 16:46

Copilot AI reviewed Dec 16, 2024

View reviewed changes

chakravarthik27 added 2 commits December 16, 2024 22:18

fix: handle cases where ground truth is missing in DegradationAnalysi…

c30a310

…s accuracy calculation

feat: make qa_evaluation a static method in DegradationAnalysis for q…

d1c18ae

…uestion-answering task evaluation

chakravarthik27 requested a review from Copilot December 17, 2024 06:56

Copilot AI reviewed Dec 17, 2024

View reviewed changes

refactor: update variable names for clarity in DegradationAnalysis ac…

c672e5b

…curacy calculations

chakravarthik27 requested a review from Copilot December 17, 2024 08:05

Copilot AI reviewed Dec 17, 2024

View reviewed changes

langtest/transform/accuracy.py Outdated Show resolved Hide resolved

Update langtest/transform/accuracy.py

b495108

Co-authored-by: Copilot <[email protected]>

chakravarthik27 merged commit 2e45753 into release/2.5.0 Dec 17, 2024
3 checks passed

chakravarthik27 deleted the feature/support-for-qa-task-in-degradation_analysis-test branch December 24, 2024 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enhance DegradationAnalysis to support question-answering task #1153

feat: enhance DegradationAnalysis to support question-answering task #1153

chakravarthik27 commented Dec 16, 2024

Copilot AI left a comment

Copilot AI left a comment

feat: enhance DegradationAnalysis to support question-answering task #1153

feat: enhance DegradationAnalysis to support question-answering task #1153

Conversation

chakravarthik27 commented Dec 16, 2024

Copilot AI left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Copilot AI left a comment

Choose a reason for hiding this comment