Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Roberta converter #2124

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Add Roberta converter #2124

wants to merge 4 commits into from

Conversation

omkar-334
Copy link

@omkar-334 omkar-334 commented Mar 4, 2025

A few doubts -

  1. the model outputs from keras and huggingface are not similar at all.
from transformers import RobertaTokenizer
from transformers import TFRobertaModel

hf_model = TFRobertaModel.from_pretrained("roberta-base", output_hidden_states=True)
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")

text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="tf", padding=True, truncation=True)
hf_output = hf_model(**inputs).last_hidden_state
keras_inputs = {
    "token_ids": inputs["input_ids"].numpy(),  # Token IDs
    "padding_mask": inputs["attention_mask"].numpy(),  # Padding Mask
}
keras_output = model(keras_inputs)

Output comparison

  1. Hugging Face’s RoBERTa uses 514 position embeddings (512 positions + 2 extra tokens), whereas Keras only expects 512.

  2. Tokenizer comparison
    image

Copy link

google-cla bot commented Mar 4, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@omkar-334
Copy link
Author

Here's the link to the testing colab - https://colab.research.google.com/github/omkar-334/keras-scripts/blob/main/RoBERTa_converter.ipynb

Also,
RoBERTa doesn't have segment embeddings and pooled layers, but the huggingface model includes a constant segment embedding of dimensions (1, 512) I think. It also included pooled layers for downstream tasks, which the Keras implementation doesn't have.

@JyotinderSingh
Copy link
Collaborator

Hi @omkar-334. thanks for this PR.
Regarding the mismatched logits, it looks like you're loading the HuggingFace model in float32

hf_model = TFRobertaModel.from_pretrained("roberta-base")

while the Keras model is being quantized into bfloat16.

model = keras_hub.models.RobertaBackbone.from_preset("hf://FacebookAI/roberta-base", dtype="bfloat16")

It might be worth trying to load them in the same precision when verifying the logics.
Screenshot 2025-03-20 at 1 19 44 PM

@JyotinderSingh
Copy link
Collaborator

I did try to run your notebook by loading in both sets of weights as float32, but the results still don't seem to match.

Huggingface output
 tf.Tensor(
[[[-0.06098365  0.1249077  -0.01024082 ... -0.05549879 -0.05278065
   -0.02032274]
  [-0.33764896  0.20138153  0.07472473 ...  0.16684803  0.02431546
   -0.13936469]
  [-0.02943649  0.23096977  0.18173131 ... -0.14693598 -0.05403079
   -0.02496235]
  ...
  [-0.11631897  0.2576879   0.0894694  ... -0.01494528  0.07766235
    0.03402137]
  [-0.07787547  0.2642327   0.44699728 ... -0.7686613   0.02006039
    0.07307038]
  [-0.0507166   0.14344664 -0.03572293 ... -0.10117416 -0.05277743
   -0.05274259]]], shape=(1, 8, 768), dtype=float32)

Keras output
 tf.Tensor(
[[[-7.23304749e-02  1.11076608e-01 -7.59335235e-04 ... -9.13275555e-02
   -4.67573255e-02 -2.74974313e-02]
  [-1.94556322e-02  7.97019601e-02  1.06528938e-01 ... -2.88743407e-01
   -1.66224763e-02  5.16433269e-02]
  [-7.21454322e-02  1.10889256e-01 -5.06145880e-04 ... -9.06197801e-02
   -4.66767214e-02 -2.69957650e-02]
  ...
  [-3.83148864e-02  1.94189698e-01  2.10571475e-03 ...  6.51391894e-02
   -4.42184880e-03  4.94358130e-02]
  [ 2.18277685e-02  1.65410444e-01  3.22254300e-01 ... -5.11629343e-01
    3.71083468e-02  8.14208537e-02]
  [-6.67822510e-02  1.25953302e-01 -1.97500065e-02 ... -1.38220027e-01
   -4.70701084e-02 -5.49893379e-02]]], shape=(1, 8, 768), dtype=float32)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=1e-05
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants