Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prefer "datetime" types over "timestamp" types #54256

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

browner12
Copy link
Contributor

@browner12 browner12 commented Jan 18, 2025

I know this has been brought up before, but I'm going to make the case again why datetimes are the superior data type compared to timestamps, and why Laravel should make these their recommended and default types for v12 and beyond.

Premises

  • All dates should be stored as UTC values (although we will not hinder people who choose to do otherwise)
  • Any specifics given in this PR are for MySQL

What this PR does NOT do

  • force Laravel to auto convert values to a given timezone. the Laravel team has made their opinions on automatic conversion known. it is on the user to submit values to the database in their desired timezone.

Parity between datetime and timestamp

Storage requirements

timestamp requires 4 bytes for storage. datetime requires 5 bytes for storage. both allow an additional 3 bytes for precision.

One of the proposals for solving the 2038 problem for timestamp is to increase it to a 64 bit integer, which would increase its storage requirements to 8 bytes.

https://dev.mysql.com/doc/refman/8.4/en/storage-requirements.html#data-types-storage-reqs-date-time

Performance

There has been some confusion in other related PRs, Issues, and Discussions about how the performance of datetime would be worse than timestamp because it stores the date as a string, and string comparison is slower than integer comparison.

datetime is actually stored internally in a fixed length binary format which allows comparisons to be just as efficient as integer comparison.

For testing, I created a table with the following migration:

Schema::create('tests', function (Blueprint $table) {
    $table->id();
    $table->timestamp('timestamp');
    $table->dateTime('datetime');
    $table->timestamps();
});

I filled the table with 100,000 rows with a random date stored in both the "timestamp" and "datetime" fields. I ran the following queries and had results consistently within 1ms of each other.

SELECT * FROM `tests` WHERE `timestamp` <  '2025-01-01';
SELECT * FROM `tests` WHERE `datetime` <  '2025-01-01';

Allow using "CURRENT_TIMESTAMP"

Both data types allow using the "CURRENT_TIMESTAMP" for both an initial value and an "on update" value.

datetime benefits

Solves the 2038 issue

timestamp fields store their value internally as a signed 32 bit integer, which means any dates after 2038/01/19 are not valid for timestamps. this is not as big of an issue right now, since most stored dates are in the past, but could potentially be a huge problem when we reach that date. it does affect current use, too, when you may be storing a future date, like an expiration.

datetime fields have a minimum value of 1000-01-01 and a maximum value of 9999-12-31, giving us a much wider valid date range, and eliminating the 2038 problem

Ignorant of Server/SQL timezone

Lastly, what may be the most important of all the benefits of datetime, it is completely ignorant of the timezone set on either the server or SQL, while timestamp is not.

When a date is entered into a timestamp it will first attempt to convert it to UTC for internal storage. This is dependent on a couple of factors. SQL could have its own explicitly set timezone. More likely, it will be set to "SYSTEM" which means it defers to the timezone set on the OS. Either way, issues arise when SQL deems its timezone to be something other than UTC. Let's say for example, SQL's timezone is set to CST(-6). When it receives a value for a timestamp field, it will interpret the value it receives as a CST value, and convert it to UTC for internal storage, and then also convert it back to CST when the value is retrieved. Now, whether you actually intended to give it a CST value is irrelevant, because all you really care about is that the value you gave it is EXACTLY what you got back.

As long as that SQL timezone value stays the same, you're actually kind of ok, even if things don't technically match up. However, things can go very poorly if the SQL timezone changes.

Imagine again we have our server with the timezone set to CST. We insert a row with a CST value, and SQL converts the timestamp field to UTC internally. Now someone comes along and sees that the server is set to CST, but should probably be UTC because that's pretty standard for servers. Unfortunately that simple change would mess up all of our data. Now when that row is retrieved from the database, SQL sees the server is in UTC, so it just gives the internal value it stored back to us, even though thats not correct and should have been converted.

This means the value we put into the database is NOT the value we got out! Some might argue that's intentional, but I would say for the large majority of people any timezone other than UTC on the server is pure happenstance or oversight, and not actually what they intended.

If we switch to datetime fields, SQL ignores any server or SQL timezone settings and simply stores the value you give it, and returns exactly the same value when you request it. By making ourselves ignorant of any server settings, we actually protect ourselves from any unintentional errors like mentioned above.

For some real numbers, assume we started with a server in CST, the table will show how timestamp and datetime differ.

Data Type Submitted Value Internal Value Returned Value with Server CST Returned Value with Server UTC
timestamp 2020-02-12 12:00:00 2022-02-12 18:00:00 2020-02-12 12:00:00 2022-02-12 18:00:00
datetime 2020-02-12 12:00:00 2020-02-12 12:00:00 2020-02-12 12:00:00 2020-02-12 12:00:00

Questionable Changes

One thing I did not change was the softDeletes() method. I think ideally it would change to using datetimes internally, and then a new softDeletesTimestamp() method would be created for that specific use. However, I'm not sure how that would affect existing usage of softDeletes() that were executed when it used timestamps.

"datetimes" are the better default choice for date related columns, and should be the recommended way from Laravel going forward

- address 2038 issue
- only 1 extra bye
- internal binary storage for equal performance
- ignorant of server/SQL timezone
@browner12 browner12 marked this pull request as draft January 18, 2025 23:45
@browner12 browner12 marked this pull request as ready for review January 19, 2025 00:35
@Rizky92
Copy link

Rizky92 commented Jan 19, 2025

Here's my two cents. While timestamp had 2038 problem, one of the alternative was to store it as UNSIGNED BIGINT. PHP itself had already support for 64 bit timestamp. Internally, Laravel had already use UNSIGNED INT for some of its tables, which can be changed to BIGINT without breaking change.

image

I'm not sure about performance indication, although I believe it should be minimal on both sides.

The only drawback was storing as UNSIGNED BIGINT may have confuse users because it has less meaning, and using the value as timestampp in other languages that may not have 64 bit support yet would make it unreadable.

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

@ziming
Copy link
Contributor

ziming commented Jan 19, 2025

I personally feel it is better to wait till closer to 2038 and see what is the consensus is for this topic for timestamps maybe by then there is a better solution or non issue

@browner12
Copy link
Contributor Author

@Rizky92 storing as an BIGINT is a poor solution because then we lose readability in DB guis, and it will increase storage costs to 8 bytes.

datetime is fine, I think I'm just too keen on having to deal with timezones at application level or if your constraint is the storage.

I don't understand this point. can you elaborate?

@ziming we have a data type literally called "datetime" that was built to handle dates and times. if we start enforcing this good standard now, the 2038 problem literally goes away. regardless of the 2038 aspect, using datetime also takes a foot gun away from people with regards to timezones. now is the time for this better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants