Request: Separate thresholds for valid topics and invalid topics. #13

JosephCatrambone · 2024-08-15T20:29:25Z

As of writing, there's only one threshold for the zero-shot topics that's used as a cutoff for whether a topic is considered 'found' or not. Having separate thresholds for the positive and negative side of the equation would allow for us to perform more nuanced filtering, like: "It might not be about sports, but it's definitely not about travel."

Consider the case where our threshold is 0.5, the default. If we assume the false-positive rate here 4%[1] then adding ten negative topics means our odds of accidentally flagging something is 1-((1-0.04)...(1-0.04)), or 33%.

It would be nice to be able to tune that.

I imagine the change would be something akin to:

        candidate_topics = model_input["valid_topics"] + model_input["invalid_topics"]
        thresholds = [self._zero_shot_threshold_valid]*len(model_input["valid_topics"]) + [self._zero_shot_threshold_invalid]*len(model_input["invalid_topics"])

        result = self._classifier(text, candidate_topics)
        topics = result["labels"]
        scores = result["scores"]
        found_topics = []
        for topic, score, threshold in zip(topics, scores, thresholds):
            if score > threshold:
                found_topics.append(topic)

[1] Source: lost the original link so the new source is 'trust me, friendo'.

The text was updated successfully, but these errors were encountered:

JosephCatrambone · 2024-08-21T20:56:25Z

I'm not sure if this merits a separate discussion, but was the default threshold originally selected to optimize for fewer false negatives to more readily defer to GPT or was it picked as an overall optimal?

-adds dynamic metadata based filtering

zsimjee pushed a commit that referenced this issue Oct 21, 2024

Merge pull request #13 from Anchaliya75/ISSUE-1076

e60e49a

-adds dynamic metadata based filtering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Separate thresholds for valid topics and invalid topics. #13

Request: Separate thresholds for valid topics and invalid topics. #13

JosephCatrambone commented Aug 15, 2024 •

edited

Loading

JosephCatrambone commented Aug 21, 2024

Request: Separate thresholds for valid topics and invalid topics. #13

Request: Separate thresholds for valid topics and invalid topics. #13

Comments

JosephCatrambone commented Aug 15, 2024 • edited Loading

JosephCatrambone commented Aug 21, 2024

JosephCatrambone commented Aug 15, 2024 •

edited

Loading