Today, HTTP is the de-facto way to transmit files across the internet. It works pretty well to move small files because the process is cheap. But, HTTP fails to take advantage of new file distribution techniques invented in the past decade. The upgrades are nearly impossible without breaking backward compatibility and because of the huge investment in the current model of HTTP and the web.
In the coming days, there will be new challenges like:
- Moving datasets in size of petabytes.
- High volume real-time media streams.
- permanence (to prevent disappearance of important files).
Pretty much every problem boils down to high availability and bigger size. The InterPlanetary File System (IPFS) is a peer-to-peer version-controlled filesystem seeking to address these challenges.
To understand how IPFS works, it is essential to understand what a Distributed Hash Table (DHT) is.
Distributed Hash Tables:
A Hash Table is a data structure that represents an array of key-value pairs. It uses a hash function to compute an index/key. The hash table facilitates very fast lookup of values based on keys.
A Distributed Hash Table (DHT) is a distributed system that provides access to the key-value store. The store is spread over the participating nodes and offers excellent performance and scalability. DHTs are used widely in peer-to-peer systems to coordinate and maintain metadata about the system.
IPFS network illustration
A NodeId identifies a node in the IPFS system. A NodeId is nothing but the hash of its public key. Nodes can change their NodeId anytime, but, they are incentivized to remain the same. The nodes store objects (of their interest) in their local storage. These objects represent files and other data structures in IPFS.
All the nodes maintain a DHT that is used to find:
- Network address of other peers in the network, and
- Peers who can serve a particular object
This DHT allows IPFS to find peers that can serve an object and be able to reach to the peer over the network.
IPFS finding object from network peers
IPFS uses a different content addressing method to identify content. This approach is different from what exists in the web today, where a server that contains the object addresses the content.
IPFS content address resolves to an IPFS Object which contains a list of IPFS Links and some content data. When adding huge files to IPFS, it is broken into many smaller chunks, and the address resolves to a list of IPFS links that point to the chunks. This content addressing approach confirms that the address will always return the same file.
Another benefit of this approach is that as long as one of the nodes has the content, it can always be accessed from the IPFS network. This solves the dead-link problem that exists in today's internet. Duplicate files do not take multiple of space because it will always point to the same content hash. Having just one copy in the network is enough for it to be retrievable from the network.
IPFS: Content addressed peer-to-peer file system
Given the content address, the IPFS network responds with the peers that have the required objects. The object and its links can be received simultaneously from multiple peers.
Our peers can share the object after IPFS returns it from the network. The node can also verify that the content has not been tampered with because hash value maps to content in the IPFS network. Since the content of the file generates the content address (hash), it changes whenever a file is modified.
It is not convenient to share a new content address every time the file changes. To tackle this issue, IPFS has a concept of IPNS, or the Inter Planetary Name System.
IPFS provides a generic way to share files and data in a peer to peer fashion optimizing delivery. Some of the use case scenarios where IPFS would be great are:
- Sharing files and huge datasets.
- Serving websites and blogs on IPFS.
- Using as a data source or sink in data processing workflows.
- Using as a Mounted filesystem.
The white-paper discusses a peer-to-peer, version-controlled approach to a filesystem. This filesystem has no single point of failure. Since the data is content-addressed, the nodes in the network don't need to trust each other. As per this article, IPFS has the potential to replace HTTP and make the web, distributed.