I will discuss how we, as programmers, should handle dates and times, both mentally, and in production code.
java.util.Date
and java.util.Calendar
java.time
API
I will ignore relativistic effects such as time dilation due to travelling at a very high speed, because only few programmers, such as those who write code for satellite clocks, need to put that into consideration. In our everyday lives, no computer would be travelling at relativistic speeds.
I will also ignore leap seconds, as they add unnecessary complexity. There are few applications where leap seconds matter anyway, and the common dates and times APIs do not support leap second either.
Unless specified, all Swift code is written in Swift 4.2, and all Java code is written in Java 8.
Handling dates and times correctly is very important when developing software. If not done properly, this could lead to really hard-to-trace bugs, or even catastrophic effects. Here are a few examples of bugs caused by handling dates and times wrongly.
A few dozen years ago, the year component of a date is usually stored as a 2 digit number instead of the full four digits to save as much memory as possible. This is especially true in embedded systems like elevators and thermostats, where memory is even more limited. As a result, the year 1987 would be represented as 87, for example. However, a representation like this does not differentiate between years that are 100 years apart, such as 1901 and 2001, which are both represented as 01.
Depending on the implementation, on the first day of 2000, the computers might display the year as 1900 (99 wraps back to 0) again, or 19100 (99 increments to 100 and is directly concatenated to the string “19”), or some other kind of undocumented behaviour. Many predicted that there would be chaos on 1 January 2000 because of this. This called the Millennium Bug or the Y2K bug.
Fortunately, developers had worked very hard and fixed most of the critical bugs before the new year, and the effects of this bug was mild. One of the most significant thing that happened was just that a radiation monitoring team failed in Ishikawa, Japan.
Although the Millennium Bug did not cause any long-lasting effects, it did teach us a lesson - Store years in its fullest form, don’t shorten it. As you will see shortly in the case studies, this is exactly what java.util.Calendar
and java.time
did.
Although the Japanese use the Gregorian calendar just like the west, they also use their own Japanese calendar system. The only difference is that the Japanese system numbers years differently. Years are numbered within regnal eras. Each reign is a different era and is given a different gengō (era name).
In fact, dates are almost always written in the Japanese calendar system in official documents. For example, in this driving license, the expiry date is written as 平成29年10月8日 (8 October, Heisei 29), where “平成 (Heisei)” is the gengō of the reign of Akihito. Heisei 29 then means the 29th year of Akihito’s reign, which is the year 2017 in the Gregorian calendar. The following table shows some more gengō and their corresponding Gregorian calendar years (source).
Gengō | Gregorian Calendar Years |
---|---|
明治 (Meiji) | 1868-1912 |
大正 (Taishō) | 1912-1926 |
昭和 (Shōwa) | 1926-1989 |
平成 (Heisei) | 1989-2019 |
令和 (Reiwa) | 2019 - (current) |
On 1 May 2019, the Japanese era changed from Heisei (平成) to Reiwa (令和), because Naruhito ascended the throne as the new emperor. This change meant that all the years after 1 May 2019 must be reformatted as “令和X年”, as opposed to “平成X年”. 2019 would be 令和1年 or 令和元年 (“元” meaning “the first”) However, the name of the new era was announced only a month before the change, and possibly because of this, many developers were not able to update their software systems in time, causing various bugs including:
The last issue is probably caused by storing the year as a Heisei year, and rolling back to 1 at the end of the era. The developers probably forgot to also store the era name, which meant that the software will interpret a value of 1 as Heisei 1, which is 1989, resulting in the wrong formatting shown. In fact, this behaviour is observed in many banks, including Hokkaido Bank, Hokuriku Bank, and Bank of Yokohama.
This incident (and the Millennium Bug) just shows how much thought we need to put in when we are designing systems. We need to be extra careful and think about what might change in the future. This is especially true for systems involving dates and times, because they seem so simple that are often overlooked. Many Japanese developers did not think that the old Emperor Akihito would abdicate so quickly, and thought the Heisei era would go on for a few more dozen years, which was why they designed a less flexible system and did not manage to update it in time.
According to this talk by Jon Skeet, who worked on Google Calendar, there was a bug in Google Calendar that was only observed in Brazil. The bug was caused by storing the start and end time of a full day event as 00:00 of that day and 00:00 of the next day respectively. Although this seemed totally reasonable on first sight, this caused a crash in Brazil because Brazil used to go into daylight saving time at 00:00. This meant that clocks go 23:59:59, then the next second is 01:00:00 - the time 00:00:00 does not exist. As a result, trying to parse the time of 00:00 in the local timezone of Brazil caused an exception to be thrown.
Note that Brazil no longer observes DST at the time of this writing.
When we say “00:00”, we don’t actually mean 00:00 the time. We mean the start of the day. For most timezones, this would be the same as the time 00:00, but in Brazil, on the day of transitioning into DST, it would be 01:00. To handle this, we would need to write a function that takes a date and returns a time that marks the start of that day. This is very similar to the LocalDate.atStartOfDay(ZoneID)
method in java.time
, which will be discussed in more detail in the second case study.
In general, we should never assume that any local time exists, because there might be somewhere in the world that uses that time as the daylight saving time transition, or has a transition across that time. The atStartOfDay
function solves this problem because it essentially checks whether that time exists or not, and returns a value based on the check.
In the iOS app “EZDiary”, which is a diary app, there used to be a time-zone related bug. When the user travels to another timezone, all the entries that the user has written in the original timezone will disappear. When the user travels back however, they re-appear.
After examining the source code (note that the repository’s name is different from the display name on the App Store, and you may have to go back to a previous commit to see the code with the bug.), I realised where the problem was. When saving a diary entry, the developer gets the current Date
and truncates off the time part (making the time component represent the start of that day), and uses the resulting Date
as the date of the entry. The FSCalendar
API uses the following delegate method to determine how many dots should be shown on a particular date (the app shows one dot to represent that there is a diary entry on that day):
// Swift 2
func calendar(calendar: FSCalendar, numberOfEventsForDate date: NSDate) -> Int {
return entries[date] != nil ? 1 : 0
}
The developer implemented the method as shown. He was trying to use the date
parameter to access a dictionary called entries
containing the dates of the saved diary entries (which are time-truncated) as keys and the diary entries on those days as values. Since FSCalendar
passes in time-truncated Date
s as well, this works well, as long as you are in one timezone, that is.
The bug is probably partly due to the developer’s misunderstanding of what a Date
object represents. A Date
object represents a single point in time, as per the documentation:
A
Date
value encapsulate a single point in time, independent of any particular calendrical system or time zone.
Therefore, when we truncate the time component, the resulting Date
is only time-truncated in the local timezone and any timezones with the same offset at the that time. I will explain why that is in the next section, so don’t worry that is not immediately obvious. When the user goes to another timezone with a different offset, the saved time is still time-truncated in the old timezone, but the dates that FSCalendar
is passing in are time-truncated in the new timezone. This means that the saved date and the date
parameter would never be the same, so the entries
dictionary will never contain any of the date
parameters passed in, causing the method to always return 0. Hence, all the diary entries appear to disappear.
When we are storing dates and times, we must think carefully about what information we actually need, and store only those values that we need and nothing more. In this case, the only things that mattered to us is the year, month, and day of the date at which the entry was created, so we should have stored the date in a data structure that encapsulates only those three things. Then in the delegate method, we will have no choice but to extract the year, month, day from the date
parameter, and use those three values to access the dictionary instead.
As seen from the previous two examples, working with timezones in code can be confusing. For some people, even working with them outside the context of software engineering is a pain. Therefore, we really need a way to think about them effectively. Here, I will present a model for this purpose, and programmers should use this when dealing with timezones, because this can be used to create a good visualisation of what they are doing.
(Note that this model is only intended for a better mental understanding of timezones, and as a useful tool for finding logical errors in timezone-related code by visualising time zones themselves. It does not necessarily reflect how timezones are actually represented in computer systems. This model is based on the idea mentioned in this talk by Jon Skeet.)
Since we are ignoring relativistic effects, time is absolute for everyone and it can be said that there is an “universal time line”, similar to the real number line. Every point in time (past, present, future) lies.
We are travelling on this line at a rate of 1 second per second, always. All the points on this line form the set of all points in time (for universal time line). Note the the elements of are points in time, not tuples of years, months, days, hours, minutes, seconds, and milliseconds. The values of year, month and day would depend on the calendar system, and the values of hour and minute would depend on the timezone we use. Timezones and calendar systems are man-made ideas, so a tuple of these values would not be suitable to represent such a fundamental concept as a “point in time”.
A point in time can be thought of as a real number. It is seconds since some arbitrarily defined epoch (). A negative would represent a point in time before the epoch. This can be useful when we want to plot a graph, which we will do in a moment.
If we are in a timezone, there would be the concept of local date times. A local date time is a tuple of a date, hours , minutes , seconds and milliseconds . The date can be represented in any calendar system. The calendar system does not quite matter for the purpose of understanding timezones. I will format local date times using the Gregorian Calendar for the rest of this document. This paragraph from Jon Skeet’s NodaTime blog post explains the difference between points in time (which he calls an “instants”) and local date times (which he calls “local date and times”):
A local date and time isn’t tied to any particular time zone. At this moment, is it before or after “10pm on August 20th 2011”? It depends where you are in the world. (I’m leaving aside any non-ISO calendar representations for the moment, by the way.) So a
DateTimeOffset
contains a time-zone-independent component (that “10pm on …” part) but also an offset from UTC - which means it can be converted to an instant on what I think of as the time line. Ignoring relativity, everyone on the planet experiences a a particular instant simultaneously. If I click my fingers (infinitely quickly!) then any particular event in the universe happened before that instant, at that instant or after that instant. Whether you were in a particular time zone or not is irrelevant. In that respect instants are global compared to the local date and time which any particular individual may have observed at a particular instant.
Two local date times are equal if and only if their dates represent the same day, and their hours, minutes, seconds, and milliseconds are all equal.
Let be the set of all local date times. A sentence like this should make quite a lot of sense now.
At the point in time , the local date time in New York is , and the local date time in Hong Kong is .
By reading the above statement, you might have realised that different timezones seem to transform an element in to different elements in , which is exactly what this model of timezones is about.
In essence, a timezone can be modelled as a function that transforms a point in time to a local date time. I call this the timezone function :
Although we have defined what a Universal Time of 0 means (the point in time at which the epoch occurs, i.e. 0 seconds since the epoch), we have not defined what a local date time of 0 is. By defining this, we will be able to put the elements of onto a line too. It should be obvious, by common sense, what should go on both sides of the local date time of 0, given that that the local date time of 0 is defined.
I define the local date time of 0 as the local date time in the timezone UTC at the point in time at which the epoch occurs. This is because UTC is the primary time standard used to regulate clocks on earth.
Now we can sketch the graph of for a particular timezone. It would look something like this:
(Note that “Universal Time” refers to the aforementioned universal time line on which we are travelling at a rate of 1 second per second, not the thing described in this Wikipedia article.)
Observe that the graph is a straight line with gradient 1. The reason for this is obvious - the local date time elapses at the same rate as the Universal Time, at a rate of 1 second per second (we are again ignoring relativistic effects here). The line also has a negative y-intercept, which means that at the epoch, the local date time at this timezone is some time behind UTC, i.e. this timezone has a negative UTC offset at the epoch Similarly, timezones that have a positive UTC offset at the epoch will have a graph with a positive y-intercept. The timezone of UTC (and timezones with a UTC offset of 0 at the epoch) will have a graph that goes through the origin:
This relationship to the y-intercept is another reason why UTC is used in the definition of the local date time of 0.
A graph of the timezone function of a timezone is called a timezone diagram. Just like how spacetime diagrams can be useful when visualising the effects of relativity, timezone diagrams can be useful when visualising timezones.
Now is a good time to clarify what I mean by “timezone”. “Timezone” in this document refers to a timezone in the IANA time zone database (tzdb). The database defines “timezone” like this (emphasis mine):
The tz database attempts to record the history and predicted future of all computer-based clocks that track civil time. It organizes time zone and daylight saving time data by partitioning the world into timezones whose clocks all agree about timestamps that occur after the POSIX Epoch (1970-01-01 00:00:00 UTC). The database labels each timezone with a notable location and records all known clock transitions for that location. Although 1970 is a somewhat-arbitrary cutoff, there are significant challenges to moving the cutoff earlier even by a decade or two, due to the wide variety of local practices before computer timekeeping became prevalent.
Source: IANA Time Zone Database version 2019b, theory.html
This map shows all the timezones in the tzdb (version 2019b):
(Source)
As you can see, each “timezone” is a lot smaller than what we traditionally think of as a “time zone”, that is, a “vertical strip” of the world map, adjusted to fit country borders, like this :
(Source)
From the two maps, we can see that timezones are generally a lot smaller than time zones. This is because the time zones are the set of places that share the same local date time now, whereas timezones are the set of places that have been sharing the same local time since the POSIX Epoch. Generally, terms such as “Eastern Time (ET)”, “Western European Time (WET)”, refer to time zones, whereas tzdb identifiers such as America/New_York
, Europe/London
, refer to timezones.
I will be using the identifiers in the tzdb to refer to particular timezones in the rest of this document. For example, Europe/London
refers to the timezone that the UK is in.
As a concrete example, Libya and Egypt share the same time zone (see second map) of Eastern European Time, but they are in different timezones (see first map) - Africa/Tripoli
and Africa/Cairo
. This is because Libya stayed in DST in October 2013 (effectively changing its time zone from Central European Time to Eastern European Time permanently), whereas Egypt stopped observing DST in April 2015 (effectively staying on Eastern European Time permanently). This meant that before October 2013, the local time in Libya is almost never in sync with the local time in Egypt. Another more obscure example is Detroit and New York, where Detroit transitions into DST a little later than the rest of the Eastern Time zone, causing America/Detroit
and America/New_York
to be two different timezones.
Although this model can be used to model both timezones and time zones, I think it is more useful to think in terms of timezones, because countries like the UK change time zones very often (BST during summer, WET during winter). In fact, a place’s time zone could have been changed at any random moment in the past for political reasons. On the other hand, the UK has always been in the same timezone - Europe/London
, whose offset changes twice a year. The whole UK will sat in Europe/London
unless a political change occurs, causing parts of the UK to observe a different time. All the historical changes in the offset are encapsulated in the timezone as well, creating another layer of abstraction, which makes it easier to think about.
If you are still not sure about what timezones are, I will show more examples of timezones in this sense, especially when the timezone observes DST, or when there is some other change in the offset.
In a lot of timezones, the offset from UTC changes twice a year - the offset generally becomes one hour more for a few months, then it changes back to the original offset. This is known as daylight saving time (DST). The reasons for and against DST, and the many of its weirdnesses are topics worth researching, but are unfortunately out of the scope of this project.
The changes in UTC offset of a timezone are called transitions or offset transitions. The transitions due to DST are called DST transitions. Transitions are modelled with discontinuities in the graph of the timezone function. For example, this is a timezone diagram for Europe/London
(extremely not to scale, showing the duration of one year only):
The transition at A is the DST transition into daylight saving time, which happens usually at the end of March. The transition at B is the DST transition out of daylight saving time, which happens usually at the end of October. The function is usually defined at the point after the transition, and not at the point before the transition.
A transition across which the offset increases is called a gap transition (e.g. the one at A), and a transition across which the offset decreases is called an overlap transition (e.g. the one at B).
It would make sense that for every over the range of time that humans is actually tracking time reasonably accurately, is defined. (After all, it would be weird if for some point in time, the local time in the UK does not exist, wouldn’t it?) However, for all , there does not necessarily exist exactly one that satisfies , i.e. timezone functions are not necessarily bijective. For example, in Europe/London
,
There are 2 such when , and there are no such when .
Timezones can have transitions for other reasons, too. For example, the Pacific/Apia
timezone, adopted by the nation of Samoa had a transition from UTC-11 to UTC+13 (both standard offsets) in 2011, essentially “skipping a day”. Apparently, this was to help trade with New Zealand. It also made Samoa one of the first places to celebrate the new year. On the timezone diagram of Asia/Apia
, this would be represented in a similar way to a transition into DST (as they are both gap transitions), but with a larger gap.
Note that Pacific/Apia
also observes DST, but since it is in the southern hemisphere, the DST transitions are almost the reverse of that in the UK. Pacific/Apia
transitions into DST at the end of September and transitions out of it at the start of April.
During World War II, Britain adopted “double daylight saving time”, its timezone became 1 hour ahead of UTC during winter, and 2 hours ahead during summer. This was to save more fuel and help the workers get home before the blackouts. Showing this on a timezone diagram of Europe/London
is left as an exercise for the reader.
From this, we can see that transitions can happen in many ways and for many reasons. Fortunately, we can use the graph of the timezone function to visualise almost all of them. One thing to keep in mind is that if a timezone has a transition, then its timezone function is not bijective. This is because a transition is either a gap transition or an overlap transition. In the case of a gap transition, the function is not surjective. In the case of an overlap transition, the function is not injective.
In general, we should not assume any timezone function is injective or surjective when writing code, because even though a timezone does not have any transitions now, it might do in the future. This means that when we are converting a value representing a local date time to a value representing a point in time, we should always handle the cases where there are no such value, and where there are multiple such values. This is why we should not assume that any local date time has a corresponding point in time in a particular timezone.
Let’s revisit the Google Calendar bug with this model of time zones in mind. Whenever we are working with timezones, we should draw a timezone diagram and annotate on it what our code is actually doing.
We know that the database saves the start and end dates of an event as points in time, otherwise this bug would not have existed. Obviously, I am talking retrospectively here because I do not have access to implementation of Google Calendar, but the programmer who wrote the bug should be aware of how their database stores dates - whether as a point in time, or as a local date time.
We also know that 00:00 on a particular day is a local date time, because it cannot be expressed as a number of seconds since the epoch (we would also need to know the timezone and the offset at that time to do that). We can mark these local date times on the timezone diagram of, say, UTC:
Because we are trying to store the time 00:00 on a particular day into the database, we are essentially trying to transform an element from to an element in . In other words, we are trying to go from a point on the y-axis of the graph to a point on the x-axis. For the timezone UTC, it would look like this:
Then we can immediately see that if there was a gap transition like the one shown below, the code would crash:
One might argue that we could always use UTC as the timezone to transform the local date times, which would indeed work, because UTC is defined to be the line , and hence is surjective. However, this would make less sense semantically, as we would be saying that a full-day event in, say, Hong Kong ends at 08:00 in Asia/Hong_Kong
(or 00:00 UTC) the next day.
The aforementioned solution of using an extra atStartOfDay
function that returns the local date time representing the start of day of a given local date can be illustrated in a timezone diagram by shifting one of the tick-marks up to the correct position:
I will now apply this model to EZDiary, explaining why the bug occurred. Again, we should show what time-truncation looks like on a timezone diagram.
We know that the Date
struct represent a point in time, as per the documentation. To create a time-truncated Date
, we first need an instance of DateComponents
with only the year, month, and day of month fields, then call the Calendar.date(from:)
method to convert the DateComponents
to a Date
. An instance of DateComponents
can be either acquired from an existing Date
object, as shown in this Stack Overflow answer (this code actually shows the whole time-truncation process, not just acquring the DateComponents
):
public func removeTimeStamp(fromDate: Date) -> Date { guard let date = Calendar.current.date(from: Calendar.current.dateComponents([.year, .month, .day], from: fromDate)) else { fatalError("Failed to strip time from Date object") } return date }
or manually created:
var dateComponents = DateComponents()
dateComponents.year = 2019
dateComponents.month = 7
dateComponents.day = 16
DateComponents
is very much like a local date time in the model. It is just that all of its fields are optional. For example, in the above code, I have left all the time-related fields (hour of day, minute of hour, etc) to be nil
. It is not very well documented what exactly happens when we do not supply enough information to Calendar.date(from:)
, but we are quite sure that when the time-related fields are missing and the date-related fields are provided, it returns the Date
(i.e point in time) representing the start of that day in the timezone represented by the Calendar.timeZone
property (which is why this approach to time-truncate a Date
works at all). Note that corner cases where the start of the day is at a non-standard time, such as Brazil, are handled automatically by Calendar
, which can be demonstrated by the following code:
var dateComponents = DateComponents()
dateComponents.year = 1987
dateComponents.month = 10
dateComponents.day = 25 // this day is a where a DST gap transition occurred in Brazil
var calendar = Calendar.current
calendar.timeZone = TimeZone(identifier: "America/Belem")! // timezone for Brazil
let date = calendar.date(from: dateComponents)!
// format the time in Brazil's timezone. If we had printed it directly it would have been in UTC
let formatter = DateFormatter()
formatter.timeZone = TimeZone(identifier: "America/Belem")!
formatter.dateStyle = .long
formatter.timeStyle = .long
print(formatter.string(from: date)) // some string showing a date at 1AM, format may differ depending on locale.
In other words, we are going from a local date time (starts of days, represented by an instance of DateComponents
) to a point in time (represented by an instance of Date
), by means of a timezone function (represented by the timeZone
property of the Calendar
that we are using). Although this is not clearly documented either, I can conclude from experience, that the default value of Calendar.timeZone
is the device’s current timezone, which most people has set to automatically update when the device moves to a new location.
We can draw a timezone diagram like this to show what happens when a user wrote some diary entries in one timezone (red) and then travelled to another timezone (blue).
The tick-marks on the y-axis represent the starts of days. In the red timezone, the Calendar.date(from:)
will only ever return the points in time A, C, E… These are the Date
objects that are saved into the database. After the user travels to the blue timezone, it will only ever return the points in time B, D, F… These are the points in time with which FSCalendar
will use to call the delegate method. As you can see, they are completely different points in time, which is why when the programmer tries to access the dictionary containing keys [A, C, E…] with the Date
from FSCalendar
, they always gets nil
.
Now you should understand why a time-truncated Date
is only time-truncated in a specific timezone, and in timezones with the same offset at that time.
One could argue that another way to fix this issue is to always set Calendar.timeZone
to UTC, so that all the Dates
are time-truncated in the same timezone, no matter what the device’s timezone is. This indeed will work, but since FSCalendar
does not set the timeZone
(i.e. uses the device’s current timezone), the code to convert the dates from FSCalendar
to UTC-time-truncated dates will add a lot of clutter. There is also the possibility of forgetting to convert it (they are both Date
types so it can be easily missed), which can cause all the diary entries to shift by one day, depending on the time zone.
Therefore, I believe the best solution in terms of maintainability and style is to store the creation dates as some representation of a local date. A data structure consisting of three integers - year, month, day of month - feels the most straightforward to me, and generally works in most cases. If space is a concern, then using a single 32 bit integer (4 bits representing the month, 5 bits representing the day of month, and the rest representing the year) is also a valid approach. The point is that the value represents a date and only a date. However, for a diary app targeted at iOS, I doubt space would be a bottleneck.
We have just gained a better understanding of what timezones really are, everything else should be relatively simple now. For the rest of this project, I will analyse some dates and times APIs, what you should be careful of when using them and what can you learn from them when designing your own APIs.
Note: All source code showing the implementation of Java’s standard library are taken from my local copy of the JDK (Java SE, version 1.8.0_73). It can be downloaded here.
java.util.Date
and java.util.Calendar
Date
In JDK 1.0 - An Abominationjava.util.Date
was introduced all the way back in JDK 1.0, and most of its functionality was quickly deprecated by java.util.Calendar
and java.text.DateFormat
when JDK 1.1 came along. Later, a whole new java.time
package was introduced in JDK 1.8 (aka Java SE 8). As a result, programmers are discouraged from using Date
and Calendar
. A survey in 2018 showed that only 12% of people are using a version older than Java 8, so unless you are one of the 12%, you should not be using these classes.
Although Date
and Calendar
are deprecated, it is still useful to study how badly they are designed, so that we don’t repeat these mistakes.
Date
represents a point in time using an integer number of milliseconds since the Java Epoch (January 1, 1970, 00:00:00 GMT). This can be seen in its source code as a field named fastTime
. Just a few lines below fastTime
's declaration, there is also a declaration for cdate
:
private transient long fastTime;
/*
* If cdate is null, then fastTime indicates the time in millis.
* If cdate.isNormalized() is true, then fastTime and cdate are in
* synch. Otherwise, fastTime is ignored, and cdate indicates the
* time.
*/
private transient BaseCalendar.Date cdate;
According to the comment, Date
can use either or both of these fields to represent the date. cdate
is of type sun.util.calendar.BaseCalendar.Date
, which is not open-source, but from its public interface, I can guess that it is represented by a number of calendar components (year, month, day of month, hour of day, minute of hour, etc), with the addition of a timezone. Seeing that there are many calls to TimeZone.getDefaultRef
in the Date
class, the timezone is probably set to the device’s timezone.
The normalize
private method seems to synchronise the cdate
field with the fastTime
field:
private final BaseCalendar.Date normalize() {
if (cdate == null) {
BaseCalendar cal = getCalendarSystem(fastTime);
cdate = (BaseCalendar.Date) cal.getCalendarDate(fastTime,
TimeZone.getDefaultRef());
return cdate;
}
// Normalize cdate with the TimeZone in cdate first. This is
// required for the compatible behavior.
if (!cdate.isNormalized()) {
cdate = normalize(cdate);
}
// If the default TimeZone has changed, then recalculate the
// fields with the new TimeZone.
TimeZone tz = TimeZone.getDefaultRef();
if (tz != cdate.getZone()) {
cdate.setZone(tz);
CalendarSystem cal = getCalendarSystem(cdate);
cal.getCalendarDate(fastTime, cdate);
}
return cdate;
}
There are also getXXX
methods that returns a particular calendar component. For example, getYear
returns the number of years since 1900:
public int getYear() {
return normalize().getYear() - 1900;
}
The other getXXX
are implemented in a similar way - first call normalize
, then call the corresponding getXXX
method on the returned BaseCalendar.Date
instance, and optionally do some calculations. Apparently, getYear
is designed to return a 1900-based year is because C did the same thing, but I still don’t think “C did it” is a good reason for designing this so counterintuitively. The corresponding setXXX
methods that set the calendar components changes cdate
, but does not synchronises fastTime
with it.
Then there is the parse
method, and the constructor taking a String
that calls parse
. parse
returns the number of milliseconds since the epoch that the string represents as a long
. The parse
method is about 160 lines long, goes into as deep as 10 levels of nesting, and is also badly designed in more than one way:
IllegalArgumentException
without a message is thrown - there is no indication of which part of the string could not be parsed.There are also the toLocaleString
and toGMTString
methods, which returns the string representation of the Date
in the device’s timezone and GMT (which for our intents and purposes is the same as UTC) respectively. You cannot specify the locale in which you want the string to be formatted either, which makes localisation very difficult.
To summarise, the Date
class is responsible for:
That is too many responsibilities for Date
, a blatant violation of the Single Responsibility Principle.
Calendar
And Others In JDK 1.1 - Better But Not Good EnoughIn JDK 1.1, things improved a little. The non-deprecated members of Date
are now only responsible for responsibility 1, as responsibilities 2 and 3 are said to be handled by Calendar
, and 4 and 5 are handled by DateFormat
. This somewhat solves the localisation issue. Calendar
also uses a 1-based year instead of a 1900-based year. However, Calendar
is not very well designed either. It still stores two representations of a date - one as a set of calendar components, and one as a number of milliseconds since the epoch:
// Internal notes:
// Calendar contains two kinds of time representations: current "time" in
// milliseconds, and a set of calendar "fields" representing the current time.
// The two representations are usually in sync, but can get out of sync
// as follows.
// 1. Initially, no fields are set, and the time is invalid.
// 2. If the time is set, all fields are computed and in sync.
// 3. If a single field is set, the time is invalid.
// Recomputation of the time and fields happens when the object needs
// to return a result to the user, or use a result for a computation.
/**
* The calendar field values for the currently set time for this calendar.
* This is an array of <code>FIELD_COUNT</code> integers, with index values
* <code>ERA</code> through <code>DST_OFFSET</code>.
* @serial
*/
@SuppressWarnings("ProtectedField")
protected int fields[];
/**
* The currently set time for this calendar, expressed in milliseconds after
* January 1, 1970, 0:00:00 GMT.
* @see #isTimeSet
* @serial
*/
@SuppressWarnings("ProtectedField")
protected long time;
The computeFields
method is used to synchronise the fields
with the time
, whereas the computeTime
method is used to synchronise the time
with the fields
. computeFields
is called when time
is set:
public void setTimeInMillis(long millis) {
// If we don't need to recalculate the calendar field values,
// do nothing.
if (time == millis && isTimeSet && areFieldsSet && areAllFieldsSet
&& (zone instanceof ZoneInfo) && !((ZoneInfo)zone).isDirty()) {
return;
}
time = millis;
isTimeSet = true;
areFieldsSet = false;
computeFields();
areAllFieldsSet = areFieldsSet = true;
}
…potentially before a field is set:
public void set(int field, int value)
{
// If the fields are partially normalized, calculate all the
// fields before changing any fields.
if (areFieldsSet && !areAllFieldsSet) {
computeFields();
}
internalSet(field, value);
isTimeSet = false;
areFieldsSet = false;
isSet[field] = true;
stamp[field] = nextStamp++;
if (nextStamp == Integer.MAX_VALUE) {
adjustStamp();
}
}
…and even potentially during a call to get
!
public int get(int field)
{
complete();
return internalGet(field);
}
...
protected void complete()
{
if (!isTimeSet) {
updateTime();
}
if (!areFieldsSet || !areAllFieldsSet) {
computeFields(); // fills in unset fields
areAllFieldsSet = areFieldsSet = true;
}
}
This synchronisation between the two representations has caused many performance problems and bugs. The synchronisation is done via a timezone represented by an instance of java.util.TimeZone
, which only allowed a daylight saving time to be offset from the standard time by one hour. This means that the points in time during WW2 in the UK could not be represented correctly, as the UK was using double DST. Although this has been fixed, the fix is not very flexible.
Like Date
, Calendar
uses the device’s timezone by default. If you are writing client side code, this is usually fine, but on the server side, it is often preferred to use the client’s timezone, or a time standard like UTC, to compute dates and times. Therefore, you should always call Calendar.getCalendar
with a TimeZone
object. This not only applies to Calendar
, but it also applies to the java.time
API as well, when calling the now
methods.
On the other hand, Date
in JDK 1.1 is quite good at doing its (now sole) job of representing a point in time quite well, as long as you don’t use any of the deprecated methods. We should really assign our classes one and only one job.
A common feature among Date
and Calendar
is that they are both mutable and not thread-safe. Calendar
is so mutable that calling get
to get a particular calendar component will potentially change its state because the fields
will be set by a call to computeFields
, called by complete
(code above). In fact, time
will potentially be set too, since updateTime
calls computetime
! It is so thread-unsafe that none of their fields are volatile
, or are using one of the thread-safe AtmoicXXX
types. None of their methods are synchronized
either (except writeObject
and the static getAvailableLocales
), so it is up to the programmer using them to remember to put them in synchronized
blocks. Being this thread-unsafe and this mutable is a recipe for disaster because there is no guarantee that the fields are mutated in a synchronised way. They could be set totally out of order.
Overall, the reasons why Date
and Calendar
should not be used mentioned above (plus some others) are summarised by this excerpt of the Java Specification Request for the java.time
APIs in Java 8, JSR 310 (emphasis mine):
Currently Java SE has two separate date and time APIs -
java.util.Date
andjava.util.Calendar
. Both APIs are consistently described as difficult to use by Java developers on weblogs and forums. Notably, both use a zero-index for months, which is a cause of many bugs.Calendar
has also suffered from many bugs and performance issues over the years, primarily due to storing its state in two different ways internally.One classic bug (4639407) prevented certain dates from being created in a
Calendar
object. A sequence of code could be written that could create a date in some years but not in others, having the effect of preventing some users from entering their correct birth dates. This was caused by theCalendar
class only allowing a daylight savings time gain of one hour in summer, when historically it was plus 2 hours around the time of the second world war. While this bug is now fixed, if at some point in the future a country chose to introduce a daylight savings time gain of plus three hours in summer, then the Calendar class would again be broken.The current Java SE API also suffers in multi-threaded environments. Immutable classes are known to be inherently thread-safe as their state cannot change. However, both
Date
andCalendar
are mutable, which requires programmers to consider cloning and threading explicitly. In addition, the lack of thread-safety inDateTimeFormat
is not widely known, and has been the cause of many hard to track down threading issues.As well as the problems with the classes that Java SE has for datetime, it has no classes for modelling other concepts. Non-time-zone dates or times, durations, periods and intervals have no class representation in Java SE. As a result, developers frequently use an int to represent a duration of time, with javadoc specifying the unit.
The lack of a comprehensive date and time model also results in many common operations being trickier than they should be. For example, calculating the number of days between two dates is a particularly hard problem at present.
Now that we have seen what should not be done, let’s look at how Java 8’s java.time
API improves dates and times handling.
java.time
APIThe most significant difference between the java.time
APIs and the old Date
and Calendar
is arguably that there are a lot more classes to represent different kinds of dates and times related concepts. With the old APIs, when you want to represent a point in time, you’d use Date
. For everything else, you’d use Calendar
, or your own custom data structure that you have to write yourself. For example, to represent the expiry date of a credit card, you can either use two int
variables representing the year and month respectively, or with a Calendar
, ignoring all fields except the year and month, both of which are not ideal solutions. In Java 8, this is solved by the YearMonth
class. If EZDiary were written in Java, the programmer could have used LocalDate
to store the date of each entry, which would have avoided the bug.
Here is a summary of the commonly-used classes that represent dates and times concepts:
Class | Description | Internal Representation | Example Use Case |
---|---|---|---|
LocalTime |
a time without a timezone | hour, minute, second, and nanosecond, all integers | daily alarm |
LocalDate |
a date without a timezone in the ISO 8601 (Gregorian) calendar system | year, month, day of month | date of birth |
LocalDateTime |
a local date time, a combination of LocalDate and LocalTime |
a LocalDate and a LocalTime |
time for a birthday party |
YearMonth |
a year and a month only, in the ISO 8601 Calendar System | year, month | credit card expiry date |
MonthDay |
a month and a day only, in the ISO 8601 Calendar System | month, day of month | any kind of anniversary date |
Instant |
a point in time | an integer number of seconds and an integer number of nanosecond-of-second since the epoch | the time when a post is posted on a forum |
ZoneId |
a timezone ID that refers to a timezone | abstract class, no internal representation | a user’s preferred timezone on a social network |
ZoneOffset |
a subclass of ZoneId that refers to a timezone with no transitions, i.e. a constant offset |
the offset from UTC in seconds, and an ID | the offset from UTC of Europe/London at some point in time |
ZonedDateTime |
a point in time represented by a date and time in a timezone | a LocalDateTime , a ZoneOffset , a ZoneId |
the time a message is sent in an inter-timezone chat app |
OffsetDateTime |
a point in time represented by a date and time in a timezone with a constant offset | a LocalDateTime , a ZoneOffset |
the time a message is sent in an inter-timezone chat app |
Duration |
a time-based amount of time | number of seconds and nanoseconds of second, both integers | time left until a shop closes |
Period |
a date-based amount of time in the ISO 8601 calendar system | number of whole years, whole months, and whole days | account age |
A few notes on the above table:
Month
enum.Instant
, LocalTime
, Duration
) uses 2 integers (64-bit seconds
and 32-bit nanos
in the case of Instant
) to represent the seconds component. I can think of other ways to do this, such as using a single floating point number or using four integers - seconds
, millis
, micros
and nanos
. The former uses less memory but a lot of values cannot be accurately represented, since the floating point numbers are not evenly spread along the real number line. The latter saves the calculation during the access of millisecond and microsecond components, while using up more memory. We should consider the tradeoffs between these factors when designing an API like this.ZoneId
represents a timezone identifier, such as Europe/London
, not the timezone itself. It exposes the getRules
method that returns an instance of ZoneRules
) to describe the various transitions of the timezone. Therefore, in practice, I think it is okay to think of ZoneId
as representing the timezone itself, as there are no other classes in java.time
that represent timezones themselves. In my opinion, a date and time API should really have a type that represents timezones.ZoneOffset
s are just a fixed offset from UTC, they still have identifiers because ZoneOffset
is a subclass of ZoneId
. Their identifiers are basically offsets expressed in the ISO 8601 format, e.g. -08:00
, +01:00
.ZoneId
called ZoneRegion
that is used to represent regional identifiers like Europe/London
. The internal representation is what you’d expect - an identifier and an instance of ZoneRules
. However, this class is not public so I did not include it in the table.ZonedDateTime
- LocalDateTime
, ZoneId
and ZoneOffset
. If the third one surprises you, consider a timezone function with a single overlap transition. In that timezone, any local date time during the transition corresponds to 2 points in time, creating an ambiguity. To resolve this ambiguity, ZonedDateTime
needs the current offset of the timezone.Duration
and Period
is that Duration
supports fields that have a fixed length, such as seconds, minutes, and hours, whereas Period
supports fields that do not, such as months and years. They both support days but 1 day is represented differently. A Duration
of 1 day is exactly 86400 seconds, whereas a Period
of 1 day is just 1 day (the day
field is set to 1). This difference might seem subtle, but is very significant when adding Duration
s and Period
s to ZonedDateTime
s. In the context of ZonedDateTime
s, the length of 1 day might not be 86400 seconds, due to DST transitions.All the classes’ internal representations make a lot of sense, unlike Calendar
and Date
which synchronises between 2 representations. The classes also compose well together, e.g. a LocalDateTime
is a composition of LocalDate
and LocalTime
.
By providing so many different classes, the java.time
API forces its users to think carefully about their use case before choosing which class to use. This is good API design, because it makes sure that programmers are writing code that they actually understand, instead of blindly using Calendar
or Date
all the time. If you are writing your own dates and times API, I strongly suggest that you do this as well, because there is a high chance that you will need to represent more than one kind of date/time data.
All the classes shown in the table, and a lot more in the package, are all immutable, and hence thread-safe. All the fields are declared final
, which means that they can’t change once they are initialised. LocalDateTime.plus
does not change the state of the LocalDateTime
on which it is called, but returns a new instance of LocalDateTime
, for example. This undoubtedly fixed the huge problem of the lack of thread-safety in the old APIs. Classes representing dates and times concepts should be immutable because the concepts they represent are essentially “simple values” rather than “objects with state”. If your language supports value types (such as C# or Swift), these classes should be implemented as value types instead. These kind of classes are extremely suitable to be immutable.
java.time
supports a total of 5 calendar systems:
Calendar System | Represented By | Corresponding Date Class |
---|---|---|
The ISO 8601 Calendar | IsoChronology |
LocalDate |
The Hijrah Calendar (Islamic Calendar) | HijrahChronology |
HijrahDate |
The Japanese Calendar | JapaneseChronology |
JapaneseDate |
The Minguo Calendar | MinguoChronology |
MinguoDate |
The Thai Buddhist Calendar | ThaiBuddhistChronology |
ThaiBuddhistDate |
You should only use these specialised date classes when you need to do calendar-system-specific calculations. For example, adding a month to a date in the Hijrah Calendar does not necessarily produce the same result as in the ISO 8601 Calendar, because the length of months in the two calendars are different. On the other hand, if you just want to format a date in a particular calendar system for localisation purposes, you should not use these date classes because these classes are not for formatting. Use a DateTimeFormatter
with a custom chronology set instead. e.g.
String formatted = DateTimeFormatter.ofLocalizedDate(FormatStyle.FULL)
.withChronology(JapaneseChronology.INSTANCE)
// this is actually the year 2001. java.time uses 1-based years as well
.format(LocalDate.of(2001, 2, 23)); // even LocalDates can be formatted in the Japanese Calendar
System.out.println(formatted); // 平成13年2月23日
In addition to classes, the java.time
package also contains a lot of interfaces that define what classes can do. By using these interfaces in method signatures, the flexibility and extensibility of the whole framework is increased by a lot. We can also learn a lot about the nature of the classes by looking at what interfaces they implement and which they do not.
We can split the concrete classes shown in the first table of the preceding section into two categories:
Category 1 | Category 2 |
---|---|
LocalTime , LocalDate , LocalDateTime , YearMonth , MonthDay , Instant , ZoneOffset , ZonedDateTime , OffsetDateTime |
Period , Duration |
It should be somewhat intuitive why they are categorised this way, except maybe for ZoneOffset
. Specifically, Every member of Category 1 has a set of temporal fields (temporal fields are represented by classes that implement TemporalField
) that can be accessed. For example, LocalDate
supports the fields “year”, “month-of-year”, and “day-of-month”, among many others. (Note that it is important, when referring to temporal fields, to not just say “day”, “second”, or “week”. It is ambiguous whether you mean “day-of-year”, “day-of-month” or “day-of-week” if you just say “day”, for example.) Therefore, this category is called the temporal accessors, represented by classes that implement the TemporalAccessor
interface. The reason why ZoneOffset
implements TemporalAccessor
is because the offset is a temporal field too (also supported by OffsetDateTime
and ZonedDateTime
).
Every member in Category 2 represents an amount of time. Therefore, they are the temporal amounts, represented by classes implementing the TemporalAmount
interface. A temporal amount has a set of temporal units (represented by classes that implement TemporalUnit
) that defines it. Period
is defined by the units “day”, “month” and “year”, for example. It is important to understand that the temporal unit of “month” is not the same as the temporal field of “month-of-year”, the temporal unit of “year” is not the same as the temporal field of “year(-of-era)”, and so on.
The Temporal
interface is a subinterface of TemporalAccessor
. It represents the subset of temporal accessors that have well-defined plus and minus operations. For example, LocalDate
implements this interface because the plus and minus operations are well-defined for it (as long as the TemporalPeriod
added to/subtracted from it is defined by units that are supported by LocalDate
). MonthDay
does not implement Temporal
because adding, say, one day to 28 February is not well-defined unless we know what year it is, and so is subtracting one day from 1 March.
To the framework, These interfaces are very useful for creating general methods that can operate on any temporal amount, any temporal accessor, or any temporal. For example, LocalDate
has a factory method called from(TemporalAccessor)
, implemented like this:
public static LocalDate from(TemporalAccessor temporal) {
Objects.requireNonNull(temporal, "temporal");
LocalDate date = temporal.query(TemporalQueries.localDate());
if (date == null) {
throw new DateTimeException("Unable to obtain LocalDate from TemporalAccessor: " +
temporal + " of type " + temporal.getClass().getName());
}
return date;
}
where TemporalQueries.localDate()
returns this:
static final TemporalQuery<LocalDate> LOCAL_DATE = (temporal) -> {
if (temporal.isSupported(EPOCH_DAY)) {
return LocalDate.ofEpochDay(temporal.getLong(EPOCH_DAY));
}
return null;
};
Therefore, instead of saying that from(TemporalAccessor)
can be applied on any temporal accessor, it is more accurate to say that it only works on temporal accessors that supports the epoch-day temporal field. Some might argue that the API should not be designed like this because using interfaces as parameters makes the methods unsafe, as there is no compile-time check that ensures the parameter supports the required temporal fields, but I disagree. It is seldom useful to work with a general TemporalAccessor
object in application code, as we do not know what calendar system it uses, or what time zone (if any) it is in. If we are working with a specific type of temporal accessors, then as long as our variables are well-named, we can easily spot nonsense like this (or not writing this in the first place):
// departureTime is a LocalTime, we obviously can't get a date out of a time.
LocalDate departureDate = LocalDate.from(departureTime);
In addition, the alternative of adding specialised interfaces like EpochDayAccessor
for each temporal field has the disadvantage of increasing complexity.
If you are writing your own date and time API, you could consider adding these interfaces, but not including them won’t cause you to lose much.
By now, you should have realised that some of the classes in the java.time
API can be mapped onto the concepts I have introduced earlier quite easily.
LocalDateTime
pretty much is a point on the local date time line (i.e. y axis of timezone diagrams), though it only has nanosecond-precision, instead of being a “continuous” line.Instant
represents a point on the Universal Time line (i.e. x axis of timezone diagrams) with nanosecond-precision as well.ZoneId
s, though technically not true, can be thought of as timezone functions for convenience’s sake. By this logic, ZoneOffset
s can be thought of as timezone functions that are continuous straight lines.LocalDate
s and dates of the other calendar systems can be thought of as sections of the local date time line of length 86400 seconds, starting and ending at the time 00:00.ZonedDateTime
and similarly OffsetDateTime
represents a point that the timezone function goes through.With that in mind, we can then think of calling methods on these dates and times classes as moving around on a timezone diagram. This diagram shows some of the operations you can do to convert between LocalDateTime
, OffsetDateTime
/ZonedDateTime
and Instant
.
The timezone function shown has 2 offsets - offset1
is the standard offset and offset2
is the DST offset. i
, ldt
, zdt
are instances of Instant
, LocalDateTime
and ZonedDateTime
respectively. The parameters that these methods take should not surprise you at this point, and the return types should be clear by looking at the method names. A point that I have not shown here is that Instant
and LocalDateTime
also have an atZone
method that takes a ZoneId
that converts the Instant
/LocalDateTime
to a ZonedDateTime
. For Instant
, this makes sense, since this operation is equivalent to starting on some point on the x axis and moving up until you reach the timezone function represented by the ZoneId
(there will be exactly one such point, as I established earlier). For LocalDateTime
though, this method seems quite a weird one to have, as there can be a transition at that local date time, meaning that a horizontal line from a point on the y axis will cross 0 or 2 points on the timezone function. The behaviour of this method at a transition is documented (emphasis mine):
[…] In the case of an overlap, where clocks are set back, there are two valid offsets. This method uses the earlier offset typically corresponding to “summer”.
In the case of a gap, where clocks jump forward, there is no valid offset. Instead, the local date-time is adjusted to be later by the length of the gap. For a typical one hour daylight savings change, the local date-time will be moved one hour later into the offset typically corresponding to “summer”.
To obtain the later offset during an overlap, call
ZonedDateTime.withLaterOffsetAtOverlap()
on the result of this method. To throw an exception when there is a gap or overlap, useZonedDateTime.ofStrict(LocalDateTime, ZoneOffset, ZoneId)
.
Nevertheless, you should think twice before using LocalDateTime.atZone
. Think about whether what it does at a transition is really what you want to happen. Though it is not always possible due to not having a ZoneOffset
, using ZonedDateTime.ofStrict
is much safer. ofStrict
will throw an exception, instead of silently giving you the wrong result.
Here is the same timezone diagram again, but with different annotations, showing some of the common operations that can be applied to a ZoneId
called zid
. This time, the return values of the operations presented here are not points on the diagram, but lines:
getRules().getTransitions()
returns the transition (represented by the vertical dotted line in the diagram, or a ZoneOffsetTransition
object in code) at a local date time, whereas getRules().getOffset()
and getRules().getStandardOffset()
returns the offset at an instant. In case it is not clear enough, getOffset(i)
returns offset2
and getStandardOffset(i)
returns offset1
. There is also a getOffset
that accepts a LocalDateTime
, which as you might have noticed, suffers from the same problem as LocalDateTime.atZone
. The documentation says that it returns the offset before the transition:
[…] Thus, for any given local date-time there can be zero, one or two valid offsets. This method returns the single offset in the Normal case, and in the Gap or Overlap case it returns the offset before the transition.
Personally, I think it would make a much clearer API if methods like LocalDateTime.atZone
and ZoneRules.getOffset(LocalDateTime)
to accept an extra parameter of type ZoneRulesAmbiguityResolver
, which is a functional interface that will be invoked when there is a transition at the local date time provided. (There could even be another overload that takes 2 of these parameters, one for each type of transition.) Here is some sample code:
@FunctionalInterface
interface ZoneRulesAmbiguityResolver {
LocalDateTime resolve(LocalDateTime original, ZoneOffsetTransition trans);
// convenient resolvers
static ZoneRulesAmbiguityResolver alwaysAfter() {
return (ldt, trans) -> trans.getDateTimeAfter();
}
static ZoneRulesAmbiguityResolver alwaysBefore() {
return (ldt, trans) -> trans.getDateTimeBefore();
}
static ZoneRulesAmbiguityResolver plusDuration() {
return (ldt, trans) -> ldt.plus(trans.getDuration());
}
}
// inside LocalDateTime class
public static ZonedDateTime atZone(ZoneId zid, ZoneRulesAmbiguityResolver resolver) {
ZoneOffsetTransition trans = zid.getRules().getTransition(this);
if (trans == null) {
return ZonedDateTime.of(this, zid);
} else {
return ZonedDateTime.of(resolver.resolve(this, trans), zid);
}
}
// usage:
ZonedDateTime zdt = someLdt.atZone(ZoneId.of("Europe/London"), ZoneRulesAmbiguityResolver.alwaysAfter());
Another interesting method is LocalDate.atStartOfDay
, which converts a LocalDate
into a ZonedDateTime
by setting the time to the start of the day in a timezone (ZoneId
), given as a parameter. (I mentioned this towards the beginning as well) This method is implemented like this:
public ZonedDateTime atStartOfDay(ZoneId zone) {
Objects.requireNonNull(zone, "zone");
// need to handle case where there is a gap from 11:30 to 00:30
// standard ZDT factory would result in 01:00 rather than 00:30
LocalDateTime ldt = atTime(LocalTime.MIDNIGHT);
if (zone instanceof ZoneOffset == false) {
ZoneRules rules = zone.getRules();
ZoneOffsetTransition trans = rules.getTransition(ldt);
if (trans != null && trans.isGap()) {
ldt = trans.getDateTimeAfter();
}
}
return ZonedDateTime.of(ldt, zone);
}
Most of the time, the LocalDate
will be combined with the LocalTime
of MIDNIGHT
, which is a constant representing 00:00, and then combined with the ZoneId
using ZonedDateTime.of
. ZonedDateTime.of
's behaviour is exactly the same as LocalDateTime.atZone
(in fact, the latter simply calls the former). The only special case the method handles is when there is a transition at the local date time in the given timezone function, in which case the nearest local date time after the transition is used (getDateTimeAfter
). In this case, the ZonedDateTime
created corresponds to the top end of the line segment representing the transition. In the case of an overlap, nothing special needs to be done because the default behaviour of ZonedDateTime.of
(returning the point before the transition instead of the one after) is desired.
We have now seen how closely related the java.time
API and the model of timezones that I have introduced are. I will now present an example of how I have used the model in conjunction with the API to answer a question on Stack Overflow.
The question essentially asks how to convert a LocalDateTime
called ldt
received from a web-service like this one to an Instant
called i
. Normally, this could be easily done like this:
// getting the local date time and timezone is irrelevant, hence omitted
LocalDateTime ldt = ...;
ZoneId zid = ...
ZoneOffset offset = zid.getRules().getOffset(ldt);
Instant i = ldt.toInstant(offset);
We can safely use getOffset(LocalDateTime)
here because we assume that the web-service is always going to give valid local date times in the timezone, so there will never be a gap transition at ldt
. There could be an overlap transition, but the web-service does not give enough information to disambiguate anyway.
However, there is a catch. The local date times that the web-service produces are all in standard time. It is as if DST is not taken into account at all. This means that to convert the local date times, we need to use the standard offset (the offset from getStandardOffset
) at that local date time. On a timezone diagram, there would be two timezone functions - one for the function represented by zid
, and the other for the function that would have been produced if zid
did not observer DST. The two functions overlap in many places. Where they do not overlap, I have used a red line to indicate the latter function. Essentially, what we are trying to do is this:
There is only a slight problem - there is no way to directly get the red line (standard offset) from a point on the y-axis (local date time) because getStandardOffset
accepts a parameter of type Instant
. We can only get the red line from a point on the x-axis. Therefore, one way is to first move to the x-axis (by toInstant
) using the black line (getOffset(ldt)
), then get the red line from there. After we have got the red line, we can move to the x-axis using it. On a timezone diagram it looks like this:
In code, it looks like this (I have made the variable names match the ones shown in the diagram):
// get the black line corresponding to ldt
ZoneOffset offset2 = zid.getRules().getOffset(ldt); // 1
// move to the x-axis using black line
Instant i1 = ldt.toInstant(offset2); // 1, 2
// get the red line by moving up from the point we just moved to
ZoneOffset offset1 = zid.getRules().getStandardOffset(i1); // 3
// move to the x-axis using the red line
Instant i2 = ldt.toInstant(offset1); // 4, 5
// i2 is the final result
From the diagram, we can see that this approach relies on the standard offset found in step 3 being the same standard offset we would have found if we moved directly to the right from the y-axis (the correct standard offset). To find out whether they will always be the same offset no matter the timezone function, I first considered the trivial case of the standard offset being constant all the time. Since there is only one standard offset, the standard offset that we get from step 3 must be the correct standard offset. There is no other offset that it could be! Whether the actual offset changes does not matter here. Another trivial case is that the standard offset changes, but is always the same as the actual offset. It should be clear that our approach will produce the correct result in this case as well, albeit in a roundabout way (line 4 would be doing the same thing as line 2).
Then I considered the case of a constant actual offset, but a change in the standard offset:
For the diagram on the left, if ldt
were between ldt2
and ldt3
, it is ambiguous which point in time it corresponds to anyway, so we don’t need to worry too much about it. If ldt
were between ldt1
and ldt2
however, step 3 will produce offset2
, whereas the correct offset is offset1
. For all other local date times, the situation is the same as the first trivial case. Therefore, this approach will only work if offset3
and offset2
are the same, causing there to be no gap between ldt1
and ldt2
. A similar argument can be made for the diagram on the right.
Trying to fix this, I looked at the implementation of ZoneRules
and found out that it stores the times and offsets of standard transitions in two parallel arrays:
/**
* The transitions between standard offsets (epoch seconds), sorted.
*/
private final long[] standardTransitions;
/**
* The standard offsets.
*/
private final ZoneOffset[] standardOffsets;
standardTransitions
is searched in getStandardOffset
, but we are never allowed to get its values directly. getTransitions
only returns the transitions in the actual offset. On the other hand, the transitions in actual offset are also stored in an array of LocalDateTime
:
/**
* The transitions between local date-times, sorted.
* This is a paired array, where the first entry is the start of the transition
* and the second entry is the end of the transition.
*/
private final LocalDateTime[] savingsLocalTransitions;
Since there is no way to access the standardTransitions
array and the standardOffsets
array, there is nothing we can do in this situation.
I tried to explore other cases and found that for one case, a modification of my approach would make it produce correct results for all local date times - transition where the standard offset and actual offset changes by the same temporal amount that is greater than or equal to the time difference between the standard and actual offsets. For all other cases, there is always a range of local date times that we can do nothing about. That one case is illustrated in these diagrams:
In both diagrams, a local date time between ldt1
and ldt2
would cause step 3 to produce offset1
, while the offset it should have produced is offset3
. This is due to getOffset
always returning the offset before the transition. Therefore, if I modify the code so that it uses the offset after the transition, then the offset produced will be correct:
Instant i1;
ZoneOffsetTransition transition = someZoneId.getRules().getTransition(ldt)
if (transition == null) {
ZoneOffset offset2 = zid.getRules().getOffset(ldt);
i1 = ldt.toInstant(offset2);
} else {
ZoneOffset offset2 = transition.getOffsetAfter();
i1 = ldt.toInstant(offset2);
}
ZoneOffset offset1 = zid.getRules().getStandardOffset(i1);
Instant i2 = ldt.toInstant(offset1);
Governments very rarely change the standard offset anyway. Even if they do, they are more likely to either change only the standard offset or change the standard and actual offsets at the same time, by the same amount, usually an integer number of hours, so it is very unlikely that my code will not work.
When writing date-and-time-related code, we should constantly think about what could potentially break our code, whether it is a different locale, a different timezone, a different era, or a different calendar system. We should judge how likely these situations are going to occur and decide whether to fix the problem or to make it easier to fix the problem when it happens. To achieve this, we must think about timezones with a clear mind, consider carefully what type of data we are manipulating, and create the corresponding data structures if necessary.
Asia/Hong_Kong
on 15 June 1941 from 03:30 to 03:00. What effects does this have on software systems?