Skip to content

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Sign in

Sign up

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

nlp-with-transformers / notebooks Public

Notifications You must be signed in to change notification settings
Fork 1.3k
Star 4.3k

Code
Issues 83
Pull requests 11
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Clustering Data With Embeddings #136

Open

1 of 11 tasks

NadimKawwa opened this issue Apr 13, 2024 · 0 comments

Open

1 of 11 tasks

Clustering Data With Embeddings #136

NadimKawwa opened this issue Apr 13, 2024 · 0 comments

Comments

Copy link

NadimKawwa commented Apr 13, 2024 •

edited

Loading

Information

The question or comment is about chapter:

Introduction
Text Classification
Transformer Anatomy
Multilingual Named Entity Recognition
Text Generation
Summarization
Question Answering
Making Transformers Efficient in Production
Dealing with Few to No Labels
Training Transformers from Scratch
Future Directions

Question or comment

Hello, I noticed that the book doesn't really have much information about clustering unlabeled data. I'm aware that there are some resources out there that address this question. However it would be nice to know what are some techniques that work best to cluster text, especially ones that don't rely on API calls that might be rate limited.
I have been pondering on these issues lately and the winning method so far is:

Generate embeddings.
MinMax scaler on features.
Use algorithm like K-means and plot number of clusters versus silhouette score.

Would appreciate to know your thoughts on this.

Best,
Nadim

The text was updated successfully, but these errors were encountered:

All reactions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

1 participant

Footer

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.

Clustering Data With Embeddings #136

Clustering Data With Embeddings #136

Comments

NadimKawwa commented Apr 13, 2024 • edited Loading

Information

Question or comment

NadimKawwa commented Apr 13, 2024 •

edited

Loading