Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE: If the input string starts with 0 then that 0 is lost during encoding. #57

Open
tejzpr opened this issue Jan 25, 2018 · 12 comments

Comments

@tejzpr
Copy link

tejzpr commented Jan 25, 2018

If the input string starts with 0 then that 0 is lost during encoding.
for e.g. if the input is 0zebra then b62.encode(b62.decode("0zebra")) outputs zebra
b62.encode(b62.decode("0000zebra")) //zebra

tejzpr added a commit to tejzpr/base62.js that referenced this issue Jan 25, 2018
Fixes If the input string starts with 0 then that 0 is lost during encoding.
@ghost
Copy link

ghost commented Jan 29, 2018

I don't think that's an issue, since all type of encoder/decoder works by formatting binary(base2) then transforming that binary to user preferred base number in this case 62 so '0x00000A' is written '10' instead of '0000010'

Sure can add a feature for supporting this kind of cases by providing a flag with count of zeroes 3$13659 which interpreted as additional zero in front of 13659 or by adding one safe char that transform zeroes in front to that special char

@bradisbell
Copy link

@gunawanwijaya That's absolutely crazy. Such an encoder/decoder should work with arbitrary binary data, regardless of what it starts with. The length of the data matters, and must be preserved. I appreciate @tejzpr for flagging this issue. If the bug is as-described, this library should be considered completely unreliable until it is resolved.

@FND
Copy link
Collaborator

FND commented Apr 13, 2018

FWIW, my understanding is also that Base62 encoding means converting between numbers and strings (indeed, the README says as much - though I had a hand in tweaking docs recently). That is, we're always talking about numbers (positive integers), merely offering a different representation.

To me this means that leading zeros are not significant and thus cannot be preserved: Both "0z" and "z" are equivalent to 35, 35 is encoded as "z".

While I understand that how this might be confusing if your expectation is for a string (rather than a number) to be encoded, I don't see what could be done about that without fundamentally altering both implementation and interpretation of what Base62 means.

Thus I'm afraid I wouldn't consider this a bug.

@dcorking
Copy link

dcorking commented Jun 6, 2018

I agree with FND - this is merely a consequence of the default charset mapping the character 0 to the number 0, purely a coincidence. So the leading zeroes in tejzpr's example are dropped during decoding, which is exactly what I want to happen.

@bradisbell
Copy link

If you can't handle strings, you should consider throwing an error when a string is passed in.

@FND
Copy link
Collaborator

FND commented Jun 7, 2018

That's a good point, actually. I'd considered input validation in v2.0, but ultimately decided against it - in part because it would result in breaking changes, which we were adamant to avoid.

One could imagine a strict mode to opt into (e.g. encode(int, { validate: true }), but that seems a bit awkward?

@bradisbell
Copy link

If it were me, I'd increment the version to v3 and make the change. If you're breaking anyone with this, there's a reasonable chance what they're doing is already broken and they don't know it.

@FND
Copy link
Collaborator

FND commented Jun 8, 2018

Well, I'd still be wary of breaking changes, even with a[nother] major version: We made sure that v2.0 maintains backwards compatibility (because there's a large number of existing users), so calling it v3.0 doesn't necessarily invalidate that requirement. @andrew, what do you think?

For the record, in #70 I'd originally set out to revamp the API entirely, but ultimately concluded it wasn't worth sacrificing backwards compatibility. So starting with a blank slate is certainly appealing, I'm just not sure how it would impact existing users (e.g. because they're not locked to a particular version).

@andrew
Copy link
Member

andrew commented Jun 8, 2018

Looking at https://libraries.io/npm/base62/usage, very few people would automatically pick up a new major version, in fact doesn't look like anyone in open source is using v2.0 yet, so a complete rewrite in 3.0 wouldn't cause any issues as far as I can see

@FND
Copy link
Collaborator

FND commented Jun 21, 2018

FWIW: I'd started porting #70's pure-ES6 approach while adding input validation on a branch, but it's very much incomplete and I'm currently too busy to turn it into a proper PR; will revisit when I get to it.

@pauldps
Copy link

pauldps commented Jul 27, 2018

I have to agree with FND and wouldn't consider this a bug either.

If I understand correctly, this library only handles number positional notation. As such, only numbers can be encoded, and as with any number representation, leading zeroes aren't preserved.

It's not like a full binary-to-text encoding process like Base64, and I think that's where the confusion is coming from.

@iskrid
Copy link

iskrid commented Jan 9, 2019

I agree, this should not be tagged as bug, it's a lib number representation and not for "encoding", maybe the naming of the methods was not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants