Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json length is larger than packet length #398

Closed
jongwony opened this issue Mar 2, 2023 · 6 comments
Closed

Json length is larger than packet length #398

jongwony opened this issue Mar 2, 2023 · 6 comments

Comments

@jongwony
Copy link

jongwony commented Mar 2, 2023

environment: MySQL 8.0
python: 3.11.1

Hexdump

# at 41168404
#230302 17:55:52 server id 62903779  end_log_pos 41169433 CRC32 0x92e2e089 
# Position  Timestamp   Type   Master ID        Size      Master Pos    Flags 
# 02742e14 98 64 00 64   1f   e3 d5 bf 03   05 04 00 00   19 32 74 02   00 00
# 02742e27 62 00 00 00 00 00 01 00  02 00 06 ff ff 00 01 00 |b...............|
# 02742e37 00 00 18 5a 35 6a 6f 57  4b 52 49 41 4c 31 38 44 |...Z5joWKRIAL18D|
# 02742e47 74 49 52 70 76 73 50 55  38 6f 6d 30 00 4e 33 64 |tIRpvsPU8om0.N3d|
# 02742e57 5a 4a 39 64 6d 34 72 64  48 52 41 52 4a 6c 43 6b |ZJ9dm4rdHRARJlCk|
# 02742e67 79 31 4a 6c 36 56 37 6b  59 77 34 66 34 53 59 37 |y1Jl6V7kYw4f4SY7|
# 02742e77 4b 53 39 30 57 47 53 74  7a 37 36 41 62 a3 01 00 |KS90WGStz76Ab...|
# 02742e87 00 00 05 00 a2 01 3a 00  0a 00 44 00 0b 00 4f 00 |..........D...O.|
# 02742e97 0b 00 5a 00 0d 00 75 00  1a 00 0c 9e 00 0c b5 00 |..Z...u.........|
# 02742ea7 02 bb 00 02 f3 00 0c 8f  01 00 02 f3 00 0c 8f 01 |................|
# 02742eb7 00 02 83 01 0c 8f 01 73  63 6f 70 65 63 6c 69 65 |.......scopeclie|
# 02742ec7 6e 74 5f 75 72 69 63 6c  69 65 6e 74 5f 6e 61 6d |nt.uriclient.nam|
# 02742ed7 65 67 72 61 6e 74 5f 74  79 70 65 73 72 65 64 69 |egrant.typesredi|
# 02742ee7 72 65 63 74 5f 75 72 69  73 72 65 73 70 6f 6e 73 |rect.urisrespons|
# 02742ef7 65 5f 74 79 70 65 73 74  6f 6b 65 6e 5f 65 6e 64 |e.typestoken.end|
# 02742f07 70 6f 69 6e 74 5f 61 75  74 68 5f 6d 65 74 68 6f |point.auth.metho|
# 02742f17 64 0e 6f 70 65 6e 69 64  20 70 72 6f 66 69 6c 65 |d.openid.profile|
# 02742f27 16 68 74 74 70 73 3a 2f  2f 69 64 2e 77 61 6e 74 |.https...id.want|
# 02742f37 65 64 2e 6a 6f 62 73 05  6f 6e 65 69 64 05 00 38 |ed.jobs.oneid..8|
# 02742f47 00 0c 13 00 0c 1c 00 0c  22 00 0c 29 00 0c 32 00 |..............2.|
# 02742f57 08 70 61 73 73 77 6f 72  64 05 6b 61 6b 61 6f 06 |.password.kakao.|
# 02742f67 67 6f 6f 67 6c 65 08 66  61 63 65 62 6f 6f 6b 05 |google.facebook.|
# 02742f77 61 70 70 6c 65 04 00 90  00 0c 10 00 0c 2b 00 0c |apple...........|
# 02742f87 48 00 0c 6b 00 1a 68 74  74 70 73 3a 2f 2f 69 64 |H..k..https...id|
# 02742f97 2d 61 70 69 2e 77 61 6e  74 65 64 2e 6a 6f 62 73 |.api.wanted.jobs|
# 02742fa7 1c 68 74 74 70 73 3a 2f  2f 69 64 2d 61 64 6d 69 |.https...id.admi|
# 02742fb7 6e 2e 77 61 6e 74 65 64  2e 6a 6f 62 73 22 68 74 |n.wanted.jobs.ht|
# 02742fc7 74 70 73 3a 2f 2f 69 64  2d 61 70 69 2d 77 77 77 |tps...id.api.www|
# 02742fd7 74 65 73 74 2e 77 61 6e  74 65 64 2e 6a 6f 62 73 |test.wanted.jobs|
# 02742fe7 24 68 74 74 70 73 3a 2f  2f 69 64 2d 61 64 6d 69 |.https...id.admi|
# 02742ff7 6e 2d 77 77 77 74 65 73  74 2e 77 61 6e 74 65 64 |n.wwwtest.wanted|
# 02743007 2e 6a 6f 62 73 01 00 0c  00 0c 07 00 04 63 6f 64 |.jobs........cod|
# 02743017 65 12 63 6c 69 65 6e 74  5f 73 65 63 72 65 74 5f |e.client.secret.|
# 02743027 70 6f 73 74 99 ad 28 58  2a 99 ae aa e5 42 00 01 |post...X.....B..|
# 02743037 00 00 00 18 5a 35 6a 6f  57 4b 52 49 41 4c 31 38 |....Z5joWKRIAL18|
# 02743047 44 74 49 52 70 76 73 50  55 38 6f 6d 30 00 4e 33 |DtIRpvsPU8om0.N3|
# 02743057 64 5a 4a 39 64 6d 34 72  64 48 52 41 52 4a 6c 43 |dZJ9dm4rdHRARJlC|
# 02743067 6b 79 31 4a 6c 36 56 37  6b 59 77 34 66 34 53 59 |ky1Jl6V7kYw4f4SY|
# 02743077 37 4b 53 39 30 57 47 53  74 7a 37 36 41 62 82 01 |7KS90WGStz76Ab..|
# 02743087 00 00 00 06 00 81 01 2e  00 0a 00 38 00 0b 00 43 |...........8...C|
# 02743097 00 0b 00 4e 00 0d 00 5b  00 10 00 6b 00 1a 00 0c |...N.......k....|
# 027430a7 85 00 0c 9c 00 02 a2 00  02 da 00 02 6a 01 0c 6e |............j..n|
# 027430b7 01 63 6c 69 65 6e 74 5f  75 72 69 63 6c 69 65 6e |.client.uriclien|
# 027430c7 74 5f 6e 61 6d 65 67 72  61 6e 74 5f 74 79 70 65 |t.namegrant.type|
# 027430d7 73 72 65 64 69 72 65 63  74 5f 75 72 69 73 75 70 |sredirect.urisup|
# 027430e7 64 61 74 65 5f 75 73 65  72 5f 75 72 69 73 74 6f |date.user.uristo|
# 027430f7 6b 65 6e 5f 65 6e 64 70  6f 69 6e 74 5f 61 75 74 |ken.endpoint.aut|
# 02743107 68 5f 6d 65 74 68 6f 64  16 68 74 74 70 73 3a 2f |h.method.https..|
# 02743117 2f 69 64 2e 77 61 6e 74  65 64 2e 6a 6f 62 73 05 |.id.wanted.jobs.|
# 02743127 6f 6e 65 69 64 05 00 38  00 0c 13 00 0c 1c 00 0c |oneid..8........|
# 02743137 22 00 0c 29 00 0c 32 00  08 70 61 73 73 77 6f 72 |......2..passwor|
# 02743147 64 05 6b 61 6b 61 6f 06  67 6f 6f 67 6c 65 08 66 |d.kakao.google.f|
# 02743157 61 63 65 62 6f 6f 6b 05  61 70 70 6c 65 04 00 90 |acebook.apple...|
# 02743167 00 0c 10 00 0c 2b 00 0c  48 00 0c 6b 00 1a 68 74 |........H..k..ht|
# 02743177 74 70 73 3a 2f 2f 69 64  2d 61 70 69 2e 77 61 6e |tps...id.api.wan|
# 02743187 74 65 64 2e 6a 6f 62 73  1c 68 74 74 70 73 3a 2f |ted.jobs.https..|
# 02743197 2f 69 64 2d 61 64 6d 69  6e 2e 77 61 6e 74 65 64 |.id.admin.wanted|
# 027431a7 2e 6a 6f 62 73 22 68 74  74 70 73 3a 2f 2f 69 64 |.jobs.https...id|
# 027431b7 2d 61 70 69 2d 77 77 77  74 65 73 74 2e 77 61 6e |.api.wwwtest.wan|
# 027431c7 74 65 64 2e 6a 6f 62 73  24 68 74 74 70 73 3a 2f |ted.jobs.https..|
# 027431d7 2f 69 64 2d 61 64 6d 69  6e 2d 77 77 77 74 65 73 |.id.admin.wwwtes|
# 027431e7 74 2e 77 61 6e 74 65 64  2e 6a 6f 62 73 00 00 04 |t.wanted.jobs...|
# 027431f7 00 12 63 6c 69 65 6e 74  5f 73 65 63 72 65 74 5f |..client.secret.|
# 02743207 70 6f 73 74 99 ad 28 58  2a 99 ae aa e5 42 89 e0 |post...X.....B..|
# 02743217 e2 92                                            |..|
# 	Update_rows: table id 98 flags: STMT_END_F
### UPDATE `oneid`.`service_client`
### WHERE
###   @1=1
###   @2='Z5joWKRIAL18DtIRpvsPU8om'
###   @3='N3dZJ9dm4rdHRARJlCky1Jl6V7kYw4f4SY7KS90WGStz76Ab'
###   @4='{"client_uri": "https://id.wanted.jobs", "client_name": "oneid", "grant_types": ["password", "kakao", "google", "facebook", "apple"], "redirect_uris": ["https://id-api.wanted.jobs", "https://id-admin.wanted.jobs", "https://id-api-wwwtest.wanted.jobs", "https://id-admin-wwwtest.wanted.jobs"], "token_endpoint_auth_method": "client_secret_post"}'
###   @5='2022-06-20 05:32:42'
###   @6='2022-12-21 14:21:02'
### SET
###   @1=1
###   @2='Z5joWKRIAL18DtIRpvsPU8om'
###   @3='N3dZJ9dm4rdHRARJlCky1Jl6V7kYw4f4SY7KS90WGStz76Ab'
###   @4='{"client_uri": "https://id.wanted.jobs", "client_name": "oneid", "grant_types": ["password", "kakao", "google", "facebook", "apple"], "redirect_uris": ["https://id-api.wanted.jobs", "https://id-admin.wanted.jobs", "https://id-api-wwwtest.wanted.jobs", "https://id-admin-wwwtest.wanted.jobs"], "update_user_uris": [], "token_endpoint_auth_method": "client_secret_post"}'
###   @5='2022-06-20 05:32:42'
###   @6='2022-12-21 14:21:02'
ROLLBACK /* added by mysqlbinlog */ /*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

Above hexdump data @4 is very long JSON type value, I think the error occurs because it doesn't properly recognize the large JSON.

Traceback

File ~/Downloads/github/data_weaver/cdc_producer/__main__.py:143, in KafkaConnector.produce(self)
    140     continue
    142 data = []
--> 143 for row in event.rows:
    144     # intersection update keys
    145     if event.event_type == const.UPDATE_ROWS_EVENT_V2:
    146         diff_keys = json.dumps(one_depth_xor(row['before_values'], row['after_values']), ensure_ascii=False)

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:459, in RowsEvent.rows(self)
    456 @property
    457 def rows(self):
    458     if self.__rows is None:
--> 459         self._fetch_rows()
    460     return self.__rows

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:454, in RowsEvent._fetch_rows(self)
    451     return
    453 while self.packet.read_bytes < self.event_size:
--> 454     self.__rows.append(self._fetch_one_row())

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:543, in UpdateRowsEvent._fetch_one_row(self)
    540 def _fetch_one_row(self):
    541     row = {}
--> 543     row["before_values"] = self._read_column_data(self.columns_present_bitmap)
    545     row["after_values"] = self._read_column_data(self.columns_present_bitmap2)
    546     return row

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/row_event.py:207, in RowsEvent._read_column_data(self, cols_bitmap)
    204     values[name] = self.packet.read_length_coded_pascal_string(
    205         column.length_size)
    206 elif column.type == FIELD_TYPE.JSON:
--> 207     values[name] = self.packet.read_binary_json(column.length_size)
    208 else:
    209     raise NotImplementedError("Unknown MySQL column type: %d" %
    210                               (column.type))

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:359, in BinLogPacketWrapper.read_binary_json(self, size)
    356 self.unread(payload)
    357 t = self.read_uint8()
--> 359 return self.read_binary_json_type(t, length)

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
size = 4
ipdb> p payload
b'\x00\x05\x00\xa2\x01:\x00\n\x00D\x00\x0b\x00O\x00\x0b\x00Z\x00\r\x00u\x00\x1a\x00\x0c\x9e\x00\x0c\xb5\x00\x02\xbb\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\x83\x01\x0c\x8f\x01scopeclient_uriclient_namegrant_typesredirect_urisresponse_typestoken_endpoint_auth_method\x0eopenid profile\x16https://id.wanted.jobs\x05oneid\x05\x008\x00\x0c\x13\x00\x0c\x1c\x00\x0c"\x00\x0c)\x00\x0c2\x00\x08password\x05kakao\x06google\x08facebook\x05apple\x04\x00\x90\x00\x0c\x10\x00\x0c+\x00\x0cH\x00\x0ck\x00\x1ahttps://id-api.wanted.jobs\x1chttps://id-admin.wanted.jobs"https://id-api-wwwtest.wanted.jobs$https://id-admin-wwwtest.wanted.jobs\x01\x00\x0c\x00\x0c\x07\x00\x04code\x12client_secret_post'

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:364, in BinLogPacketWrapper.read_binary_json_type(self, t, length)
    362 large = (t in (JSONB_TYPE_LARGE_OBJECT, JSONB_TYPE_LARGE_ARRAY))
    363 if t in (JSONB_TYPE_SMALL_OBJECT, JSONB_TYPE_LARGE_OBJECT):
--> 364     return self.read_binary_json_object(length - 1, large)
    365 elif t in (JSONB_TYPE_SMALL_ARRAY, JSONB_TYPE_LARGE_ARRAY):
    366     return self.read_binary_json_array(length - 1, large)

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
t = 0
length = 419

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:447, in BinLogPacketWrapper.read_binary_json_object(self, length, large)
    445     else:
    446         t = value_type_inlined_lengths[i][0]
--> 447         data = self.read_binary_json_type(t, length)
    448     out[keys[i]] = data
    450 return out

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
length = 418
large = False
ipdb> p out
{b'\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\x83': b'_typestoken_endpoint_auth_method\x0eopenid profile\x16https://id.wanted.jobs\x05oneid\x05\x008\x00\x0c\x13\x00\x0c\x1c\x00\x0c"\x00\x0c)\x00\x0c2\x00\x08passw', b'\x01\x0c\x8f\x01scopecl': b'rd\x05kakao\x06google\x08facebook\x05apple\x04\x00\x90\x00\x0c\x10\x00\x0c+\x00\x0cH\x00\x0ck\x00\x1ahttps://id-api.wanted.jobs\x1chttps://id-admin.wanted.jobs"https://'}

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:366, in BinLogPacketWrapper.read_binary_json_type(self, t, length)
    364     return self.read_binary_json_object(length - 1, large)
    365 elif t in (JSONB_TYPE_SMALL_ARRAY, JSONB_TYPE_LARGE_ARRAY):
--> 366     return self.read_binary_json_array(length - 1, large)
    367 elif t in (JSONB_TYPE_STRING,):
    368     return self.read_variable_length_string()

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
t = 2
length = 418

File ~/.pyenv/versions/3.11.1/envs/cdc_connector/lib/python3.11/site-packages/pymysqlreplication/packet.py:461, in BinLogPacketWrapper.read_binary_json_array(self, length, large)
    458     size = self.read_uint16()
    460 if size > length:
--> 461     raise ValueError('Json length is larger than packet length')
    463 values_type_offset_inline = [
    464     read_offset_or_inline(self, large)
    465     for _ in range(elements)]
    467 def _read(x):

ipdb> a
self = <pymysqlreplication.packet.BinLogPacketWrapper object at 0x103968cd0>
length = 417
large = False

ValueError: Json length is larger than packet length

I am working hard on debugging myself with ipdb debugger., but I would like to share it would be nice to get help in understanding the hexdump.

Looking at out , it seems that the parsing is not going well.

ipdb> p out
{b'\x00\x02\xf3\x00\x0c\x8f\x01\x00\x02\x83': b'_typestoken_endpoint_auth_method\x0eopenid profile\x16https://id.wanted.jobs\x05oneid\x05\x008\x00\x0c\x13\x00\x0c\x1c\x00\x0c"\x00\x0c)\x00\x0c2\x00\x08passw', b'\x01\x0c\x8f\x01scopecl': b'rd\x05kakao\x06google\x08facebook\x05apple\x04\x00\x90\x00\x0c\x10\x00\x0c+\x00\x0cH\x00\x0ck\x00\x1ahttps://id-api.wanted.jobs\x1chttps://id-admin.wanted.jobs"https://'}

I've been running this environment for quite some time. For small JSON s this was fine.

@xufei4987
Copy link

I have the same problem

@channss
Copy link

channss commented Mar 6, 2023

i have the same problem too.
What i've found out about this issue is so far is below:
def read_binary_json_object(self, length, large) function from packet.py is used when reading json objects. There is no problem with small json object. But when reading a long json object, a weird unknown key named 'response_types' is inserted, ruining the BinLogPacketWrapper class's __data_buffer value. Thus the offset does not match the buffer making it impossible to read correctly.

@nkr-404-NotFound
Copy link

in my case, #403 , no error is returning to program, but cpu goes up to 100% for the all time.
also, i have very long json data in database fields.

have no idea if there is any connection with this issue.
maybe can check it btw.

@jongwony
Copy link
Author

jongwony commented Jul 2, 2023

@nankingrider I checked your issue. This issue is not long json value, is a very rare case. In my binlog dump case, MySQL 8.0 are jump the junk data. but read_binary_json_type method are conflict the junk data. I think MySQL binlog is handle the junk(maybe memory eviction...?), so we need also add handler based of #182

@nkr-404-NotFound
Copy link

nkr-404-NotFound commented Jul 3, 2023

@jongwony Hi , thanks for the reply.!

  1. i'm using mysql 5.7.
  2. in py-bt, i can see the program looping in the READ function, with data(not junk data) correct from my records, which is json type, 300k elements, 12 bytes per element ( ip address ), total length 576Kb
  3. i tried to batch the op sql upstream to shrink the json length, works fine ( but not very perfect solution... )

a reasonable guess is maybe mysql generate junk data when json type field is too long ...

still trying to figure how to fix it.

@sean-k1
Copy link
Collaborator

sean-k1 commented Sep 20, 2023

@jongwony @channss @xufei4987
hello?
I'm pretty not sure This Pr resolve this Issue
Can you check this Issue in this PR?

I need your Many test cases.
IF error goes on, Please let me know how to reproduce this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants