IDs can be defined in a variety of ways. The traditional method was to use an ID as an integer that auto-increments. When working with databases, it's a common belief that the ID generated by SQL is sufficient. However, modern applications are a lot more complex, and there are many different pieces that you need to think about that integer IDs will not suffice. The reality is that many developers do not consider unique identifiers when designing a database. You can make better decisions in your day-to-day work if you know what unique identifiers are and how you can balance them in your code and data storage.
In this article, you'll learn why integer IDs can't meet the needs of modern applications and how unique identifiers such as UUID and ULID can better address modern application requirements.
As mentioned earlier, the most prominent method software applications have been using to represent the identifier of a specific bit of data has been incrementing integer numbers. There have been many good reasons for this. Integer IDs are easy to store and sort, easy to wrap your head around them, can be auto-generated, and many more.
Nevertheless, there are many drawbacks when using incrementing integer numbers as identifiers, especially when working with applications developed with the microservices approach. A significant problem is that you can't generate these identifiers at once. There are some techniques available that you could use, like skipping values or varying the starting values, but they are far from efficient. Separating ID generation from your application might result in a single point of failure, which will affect the performance of your application since round trips are required between the ID generation system and the requesting application.
Incrementing IDs are relatively easy to guess and are vulnerable to brute force or other types of attacks that might lead to leakage of business data. Attackers will target an application with easily guessable identifiers in URLs and other locations because they are significantly easier to exploit. IDs might contain more information than you can imagine, for example, the number of orders you had between period A and period B or the size of a specific dataset. This is valuable information that you don't want to fall into the wrong hands because it could mean huge problems for your business.
Using a random value from a large numerical range instead of a small numerical sequence is a realistic solution that addresses most problems. The universally unique identifier, or UUID as it's commonly known, is an example of how this idea is put into practice.
A UUID is a type of identifier that can be safely considered unique for most practical purposes and is helpful in any case where distributed unique ID generation is required. The chances of two properly generated UUIDs being identical are virtually zero, even if they are generated in separate environments.
UUIDs are unique identifiers generated and represented by standardized formats. RFC 4122 is the specification that defines valid UUIDs. It describes which algorithms can generate unique UUIDS without a centralized issuing authority. Five algorithms are included in the RFC, each of which uses a different mechanism to generate a value. These versions are:
- Versions 1 & 2. These are time-based UUIDs generated by combining DateTime values, a random value, and a part of the device's MAC address. Having identical UUIDs is nearly impossible when UUIDs are generated in this way. Version 1 & 2 UUIDs can be used to determine, for example, which database node generated the ID. This can be advantageous, particularly in distributed systems.
- Versions 3 & 5. These UUIDs are generated by hashing a namespace identifier and a name. The namespace data is essentially a UUID, and name data can be a random string. After these two values are hashed together, they produce a 36-character alphanumeric string used as the final UUID. UUID versions 3 and 5 differ primarily in that they rely on different hashing algorithms. Version 3 uses MD5, while version 5 uses SHA-1.
- Version 4. These UUIDs are random 36-character strings. Aside from a '4' that indicates the UUIDs version, all other characters are random a-z and 0-9 characters. The generation is entirely random, making the possibility of having the same IDs almost non-existent. Additionally, they don't contain identifying information like DateTime, MAC address, or name data, which could be a good or a bad thing depending on the use case.
- UUID values are unique across databases, tables, and servers. This allows you to merge rows from multiple databases or distribute databases across multiple servers.
- UUID values don't give out information about your data. Therefore it's safe to use in a URL.
- UUID values are created anywhere, eliminating the need to go back and forth to the database and simplifying the application's logic.
- UUID values have more extensive storage needs than integers.
- It's more difficult to debug them.
- Because of their size and lack of order, UUID values may cause performance issues.
Applications of UUIDs
UUIDs are suitable for a wide range of use cases. Here are a few typical applications for UUIDs.
UUIDs can be generated from a web application's front end without communicating with the server or a database. They are a good option for labeling data objects like user IDs.
Third-party tools that integrate with an application might need to generate unique identifiers for various reasons. For example, a marketing team that wants to get more insights about their users would generate a UUID to record what behavior a particular user had during a single session.
As mentioned earlier, UUIDs are useful in distributed database systems. You can split or merge large tables and store them across multiple servers. Each section of the same table can be assigned the same UUID, allowing the data to be identified as a single set when read/write operations are performed.
Despite being one of software development's most commonly used universal identifiers, UUID is not a one-size-fits-all solution. Many other implementations of unique identifiers have been developed that aim to solve some of the problems UUIDs face. The most popular implementation is ULID.
Universally Unique Lexicographically Sortable IDentifier, or ULID, is a 26-character string (128 bits) comprised of ten timestamp characters that provide millisecond precision and sixteen randomly generated characters. Both these parts are base32 encoded strings with 48 bits and 80 bits, respectively. This structure ensures the string's uniqueness while also allowing it to be sorted.
- The main advantage is that the IDs can be sorted. You can sort by creation order thanks to the timestamp that was encoded in the ULID when it was created.
- Similar to UUIDs, ULIDs are unique. The possibility of having the same ID is technically zero.
- ULIDs don't use special characters, so they can be used in URLs or even HTML. Additionally, they are shorter than UUIDs as they are comprised of 26 characters compared to UUIDs' 36 characters.
- ULIDs have embedded timestamps on the ID. Although this feature is useful, there are some use cases where timestamps would reveal sensitive information and, therefore, should be avoided.
- If you require sub-millisecond accuracy, the sorting capabilities provided by ULID might not be for you.
- There are no standards for defining ULIDs, and unlike UUIDs, they lack an RFC. This could be problematic if you need to work with vendors. However, a ULID can be defined as a UUID when necessary.
- As previously stated, ULID supports many programming languages. However, unlike UUIDs, there's no straightforward way to create ULIDs without installing a programming language and a library to generate them from within the operating system.
Applications of ULIDs
ULIDs are also suitable for many use cases, especially when sorting is required. Here are only a few common applications for ULIDs.
- ULIDs are effective for data partitioning and indexing on NoSQL databases. You can shard or partition databases easily, providing convenient sorting.
- ULIDs are a good option when working with relational databases, for example, in cases where you wish to keep auto-incremented IDs private while still sorting and accessing items from another column.
- On distributed systems, if you want to track the order of a list of events based on their ID.
UUIDs and ULIDs are unique identifiers you can safely use for decentralized identity generation without worrying about collisions. Unlike traditional integer keys, UUIDs guarantee uniqueness regardless of data store, device, or environment. Despite their widespread use in software development, UUIDs are not perfect. ULID is an excellent choice to consider if you want to generate a random value that is still sortable- a feature that UUIDs lack. Besides this, ULIDs are also great for a number of use cases and are considered the successors of UUIDs.