Athouraization Code Java Example Base62

System Design : Scalable URL shortener service like TinyURL

1. URL Shortener service

2. Features

3. System Design goals

  • Service should be able to create shortened url/links against a long url
  • Click to the short URL should redirect the user to the original long URL
  • Shortened link should be as small as possible
  • Users can create custom url with maximum character limit of 16
  • Service should collect metrics like most clicked links
  • Once a shortened link is generated it should stay in system for lifetime
  • Service should be up and running all the time
  • URL redirection should be fast and should not degrade at any point of time (Even during peak loads)
  • Service should expose REST API's so that it can be integrated with third party applications

4. Traffic and System Capacity

Traffic

Storage

Memory

Estimates

5. High Level Design

Fig. 1 A Rudimentary design for URL service
  1. There is only one WebServer which is single point of failure (SPOF)
  2. System is not scalable
  3. There is only single database which might not be sufficient for 60 TB of storage and high load of 8000/s read requests
  1. Added a load balancer in front of WebServers
  2. Sharded the database to handle huge object data
  3. Added cache system to reduce load on the database.

Fig. 2 Scalable high level design

6. Algorithm REST Endpoints

7. Database Schema

  1. User ID: A unique user id or API key to make user globally distinguishable
  2. Name: The name of the user
  3. Email: The email id of the user
  4. Creation Date: The date on which the user was registered
  1. Short Url: 6/7 character long unique short URL
  2. Original Url: The original long URL
  3. UserId: The unique user id or API key of the user who created the short URL

8. Shortening Algorithm

URL encoding through base62

          URL with length 5, will give 62⁵ = ~916 Million URLs
URL with length 6, will give 62⁶ = ~56 Billion URLs
URL with length 7, will give 62⁷ = ~3500 Billion URLs
          private static final int NUM_CHARS_SHORT_LINK = 7;
private static final String ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
private Random random = new Random(); public String generateRandomShortUrl() {
char[] result = new char[NUM_CHARS_SHORT_LINK];
while (true) {
for (int i = 0; i < NUM_CHARS_SHORT_LINK; i++) {
int randomIndex = random.nextInt(ALPHABET.length() - 1);
result[i] = ALPHABET.charAt(randomIndex);
}
String shortLink = new String(result); // make sure the short link isn't already used
if (!DB.checkShortLinkExists(shortLink)) {
return shortLink;;
}
}
}
  • 10⁰
  • 10¹
  • 10²
  • 10³
  • etc
  • 62⁰
  • 62¹
  • 62²=3,844
  • 62³=238,328
  • etc.
          0: 0,
1: 1,
2: 2,
3: 3,
...
10: a,
11: b,
12: c,
...
36: A,
37: B,
38: C,
...
61: Z
          public class URLService {
HashMap<String, Integer> ltos;
HashMap<Integer, String> stol;
static int COUNTER=100000000000;
String elements;
URLService() {
ltos = new HashMap<String, Integer>();
stol = new HashMap<Integer, String>();
COUNTER = 100000000000;
elements = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; }
public String longToShort(String url) {
String shorturl = base10ToBase62(COUNTER);
ltos.put(url, COUNTER);
stol.put(COUNTER, url);
COUNTER++;
return "http://tiny.url/" + shorturl;
}
public String shortToLong(String url) {
url = url.substring("http://tiny.url/".length());
int n = base62ToBase10(url);
return stol.get(n);
}

public int base62ToBase10(String s) {
int n = 0;
for (int i = 0; i < s.length(); i++) {
n = n * 62 + convert(s.charAt(i));
}
return n;

}
public int convert(char c) {
if (c >= '0' && c <= '9')
return c - '0';
if (c >= 'a' && c <= 'z') {
return c - 'a' + 10;
}
if (c >= 'A' && c <= 'Z') {
return c - 'A' + 36;
}
return -1;
}
public String base10ToBase62(int n) {
StringBuilder sb = new StringBuilder();
while (n != 0) {
sb.insert(0, elements.charAt(n % 62));
n /= 62;
}
while (sb.length() != 7) {
sb.insert(0, '0');
}
return sb.toString();
}

  • Encode the long URL using the MD5 algorithm and take only the first 7 characters to generate TinyURL.
  • The first 7 characters could be the same for different long URLs so check the DB to verify that TinyURL is not used already
  • Try next 7 characters of previous choice of 7 characters already exist in DB and continue until you find a unique value
          import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class MD5Utils { private static int SHORT_URL_CHAR_SIZE=7;
public static String convert(String longURL) {
try {
// Create MD5 Hash
MessageDigest digest = MessageDigest.getInstance("MD5");
digest.update(longURL.getBytes());
byte messageDigest[] = digest.digest();
// Create Hex String
StringBuilder hexString = new StringBuilder();
for (byte b : messageDigest) {
hexString.append(Integer.toHexString(0xFF & b));
}
return hexString.toString();
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(e);
}
}
public static String generateRandomShortUrl(String longURL) {
String hash=MD5Utils.convert(longURL);
int numberOfCharsInHash=hash.length();
int counter=0;
while(counter < numberOfCharsInHash-SHORT_URL_CHAR_SIZE){
if(!DB.exists(hash.substring(counter, counter+SHORT_URL_CHAR_SIZE))){
return hash.substring(counter, counter+SHORT_URL_CHAR_SIZE);
}
counter++;
}
}
}
  1. Being able to store a lot of short links (120 billion)
  2. Our TinyURL should be as short as possible (7 characters)
  3. Application should be resilient to load spikes (For both url redirections and short link generation)
  4. Following a short link should be fast
  1. A single web-server is a single point of failure (SPOF). If this web-server goes down, none of our users will be able to generate tiny urls/or access original (long) urls from tiny urls. This can be handled by adding more web-servers for redundancy and then bringing a load balancer in front but even with this design choice next challenge will come from database
  2. With database we have two options :
          {
_id: <ObjectId102>,
shortUrl: "https://tinyurl.com/3sh2ps6v",
originalUrl: "https://medium.com/@sandeep4.verma",
userId: "sandeepv",
}
  • Sharding key must be chosen
  • Schema changes
  • Mapping between sharding key, shards (databases), and physical servers
          CREATE TABLE            tinyUrl            (
id BIGINT NOT NULL, AUTO_INCREMENT
shortUrl VARCHAR(7) NOT NULL,
originalUrl VARCHAR(400) NOT NULL,
userId VARCHAR(50) NOT NULL,
automatically on primary-key column
-- INDEX (shortUrl)
-- INDEX (originalUrl)
);
  • From 3500 Billion URLs combinations take 1st billion combinations.
  • In Zookeeper maintain the range and divide the 1st billion into 100 ranges of 10 million each i.e. range 1->(1–1,000,0000), range 2->(1,000,0001–2,000,0000)…. range 1000->(999,000,0001–1,000,000,0000) (Add 100000000000 to each range for counter)
  • When servers will be added these servers will ask for the unused range from Zookeepers. Suppose the W1 server is assigned range 1, now W1 will generate the tiny URL incrementing the counter and using the encoding technique. Every time it will be a unique number so there is no possibility of collision and also there is no need to keep checking the DB to ensure that if the URL already exists or not. We can directly insert the mapping of a long URL and short URL into the DB.
  • In the worst case, if one of the servers goes down then only that range of data is affected. We can replicate data of master to it's slave and while we try to bring master back, we can divert read queries to it's slaves
  • If one of the database reaches its maximum range or limit then we can move that database instance out from active database instances which can accept write and add a new database with a new a new fresh range and add this to Zookeeper. This will only be used for reading purpose
  • The Addition of a new database is also easy. Zookeeper will assign an unused counter range to this new database.
  • We will take the 2nd billion when the 1st billion is exhausted to continue the process.

9. Cache

10. Load Balancer (LB)

  1. Between Clients and Application servers
  2. Between Application Servers and database servers
  3. Between Application Servers and Cache servers

11. Customer provided tiny URLs

12. Analytics

bussellfise1953.blogspot.com

Source: https://medium.com/@sandeep4.verma/system-design-scalable-url-shortener-service-like-tinyurl-106f30f23a82

0 Response to "Athouraization Code Java Example Base62"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel