Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: Shift score for SSE4.2 #1116

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

ckastner
Copy link
Contributor

It seems that SSE4.2 first appeared in Nehalem, and FMA/F16C in Ivy Bridge. This precedence would also match that given by the microarchitectural levels agreed to by Intel and AMD:

https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

@slaren
Copy link
Member

slaren commented Feb 17, 2025

The score given to each feature should roughly represent the performance gain expected from it. SSE4.2 is much more important than F16C or FMA by itself. In practice F16C and FMA are always used together with AVX or later, so this doesn't really matter either way.

It seems that SSE4.2 first appeared in Nehalem, and FMA/F16C in Ivy
Bridge. This precedence would also match that given by the
microarchitectural levels agreed to by Intel and AMD:

   https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
@ckastner
Copy link
Contributor Author

It seems I misunderstood the purpose of the score, thanks for the clarification.

Though the check for SSE4.2 should still precede the check for F16C/FMA, right? Nehalem has SSE4.2 but not FMA/F16C, but unless I'm mistaken SSE4.2 would not get picked up there because the check for FMA/F16C before that return early if unavailable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants