Hello World, Weavechain here, bringing big data to Web3

Omar ElNaggar
11 min readApr 6, 2022

I’m thrilled to announce the launch of Weavechain, a solution for bringing big data to Web3. They said big data could never work with blockchain technology because of performance and governance requirements; Weavechain solves that. Critical data sets are migrating to Web3 because it’s fundamentally more secure and valuable, and Weavechain makes that migration not only feasible for big data but easy (no rip and replace, no bespoke Web3 engineering).

Our vision is to build a future with strong data ownership rights, increased collaboration, and trust in technology driven by cryptographic truths. Weavechain is building Web3 infrastructure to achieve that vision starting with big data, so much of the below is technical. I highly recommend starting by reading my article Intro to Web3 and Why It’s Valuable (TLDR; It’s more secure and more valuable) to get up to speed on the fundamentals of Web3.

Weavechain has three attributes that are key to Web3 big data,

  1. Performance that’s big data ready (>1mm TPS target, 110k TPS today): Weavechain’s smart hashing solution acts like a layer-2 protocol for big data, and the result is blazing fast performance that is comparable to working with the DB directly.
  2. Architecture that satisfies enterprise data governance standards: Weavechain uses a pattern of private permissioned sidechains that are networked through a public permissionless mainnet to get the best of security and interoperability.
  3. Interoperability for integrating Web3 instead of rip and replace: Weavechain connects to any database and then gives you all of the benefits of Web3 without the need for new engineering expertise or risky rebuilds.

What’s most exciting to us is why enabling Web3 big data is so powerful. We’re working with design partners right now who are excited to benefit from verifiable data lineage, immutable forensic data, collaborative big data sets, big data monetization, and access to Web3 consumers. I’ll discuss these use cases, and then wrap up with what’s next for us.

The time is now, please feel free to ping me on Twitter or LinkedIn to learn how you can reap the benefits of Web3. In the words of Web3, LFG!

How Weavechain enables Web3 big data

We started Weavechain because we ran into three key challenges when trying to establish Web3 big data sets for Mana.ai,

1) Insufficient performance to handle big data sets,

2) Enterprise data governance incompatibility with public blockchains,

3) Ease of interacting with existing technology.

As we developed a solution, we saw the value and potential in solving this generically, and shifted our focus to Weavechain. These challenges are a big part of why enterprise blockchain adoption has been limited to date (especially data governance). You can’t lock down security logs unless you can handle an incredible amount of data, do it securely, and easily. Weavechain makes this possible.

1) Performance that’s big data ready (>1mm TPS target, 110k TPS today)

Weavechain’s defining feature is its ability to handle big data sets in combination with blockchain technology. It does this by leaving the data itself in the high performance databases that can handle requirements today, and then putting an optimized hash of that data on the blockchain. I’ll repeat myself for emphasis: the blockchain itself only has a hash of the big data set, not the data itself.

The end result is a big data set that has a cryptographic guarantee of immutability, backed by the consensus of a trust network. If any of the participating nodes have a hash mismatch during an integrity check, they’re excluded from activity until they can be reconciled.

From that baseline, our secret sauce is optimizing every step of the system to minimize the lag from interacting with the database directly. We’ve achieved 110k TPS with 3 seconds to finality right now while we’re still in our private beta. We’re confident that within 1yr our technology will support over 1 million transactions per second, a lower time to finality, and a performance penalty of under 20% compared to working with the database directly.

One last bit here, I assert that to achieve a trustworthy cryptographic guarantee of immutability, the number of data stewards in a trust network can be much lower in private permissioned networks where participants are reputable entities.

  1. In a permissionless scenario, you must assume that participants can be malicious, and thus establish free market incentives such that decentralization actually supports security. In this case, nakamoto coefficients make sense to measure the decentralization of networks, which implies its robustness.
  2. In a permissioned scenario, the trust network could consist of Toyota, Ford, GM, and Honda. These companies have a reputational disincentive to cheat or be malicious when collaborating. And a hacker would have to breach 51% of nodes to compromise the network. It would be prohibitively difficult to compromise not only one of these participants, but 3 simultaneously. It’s often impossible to guarantee that any security setup is 100% programmatically impregnable, instead the goal must be to make compromise prohibitively impractical.

2) Architecture that satisfies enterprise data governance standards

I don’t think that sensitive enterprise data should live on public blockchains, and we’ve designed Weavechain so that data can remain in the same controlled environments it lives in today. Firewalls prevent even encrypted data from being touched inappropriately, they’re a critical line of defense. To put that data on a public blockchain is effectively removing that firewall protection and letting anybody hold an encrypted copy of the data. That’s unacceptable, hence why most Enterprise Web3 projects build bespoke private blockchains that are hard to build and maintain. In addition to the risk of keys leaking, there’s the risk of quantum computing advancing to the point that modern encryption becomes irrelevant. And once something is on a blockchain, it’s there forever.

Weavechain solves this with a pattern using multiple sidechains that can be private and permissioned. Let’s say that MOBI, the consortium of car manufacturers, decides to use Weavechain to create a shared database of its MOBI Trusted Trip car telematics data in order to track carbon emissions. Each member of the consortium would run a Weavechain node in its secure network, communicating with other nodes in this sidechain and maintaining the shared database on its servers. Data access is explicitly permissioned, data wouldn’t move outside of permissioned paths, and security standards would be maintained amongst participants.

These sidechains are connected via a public permissionless Mainnet that understands where data lives for routing purposes. Say a new research team wants to get access to the MOBI data. MOBI would have a public listing on the Mainnet so that researchers can be aware that the data exists, and request access. But nothing sensitive would exist in this public forum. Our vision is to build a protocol that is open source and governed in a way that is decentralized, public, and permissionless, but that supports data that is private, permissioned, and enterprise secure. We’re calling this a “Decentralized network of DataDAOs”, and will elaborate soon.

This is a departure from many Web3 protocols that emphasize decentralization and an open ledger as requirements. Our position is that enterprise vendors deserve to continue being trusted stewards of our data in Web3. I trust Toyota with my vehicle information today, and I’ll have even more confidence in the MOBI consortium of Toyota and its competitors, without forcing them to breach data security. The public decentralized web is great for many things, but not required for everything.

3) Interoperability for integrating Web3 instead of rip and replace

We’ve designed Weavechain to sit right on top of existing databases, giving them Web3 properties instead of trying to rip them out and replace them with a Web3 solution. Historically, the level of effort and risk involved in enterprise blockchain implementations has been just as much of a deal breaker as any technical limitations, and we aim to change that.

The first step is accessing the data itself, and Weavechain already supports 18 different database formats. That means that we can hash the data from any of those databases, as well as replicate data back into them, regardless of the format. It’s pretty exciting watching an InfluxDB instance replicate into MongoDB! We’re building out adapters so that customers can interact through Weavechain precisely how they did when they connected directly to the database.

The second step is where to put the hashes, and Weavechain supports 8 different blockchain formats today, including our own Layer-1 protocol. We’ve seen how painful it is to get an enterprise to ratify and switch to a new database format for example, and we predict that blockchain technology will follow a similar pattern. If your security and legal have cleared you to build on Ethereum or Corda for example, switching to Solana would take significant effort. While each blockchain still has its unique attributes, we’re structuring Weavechain to be able to put the hashes on as many of them as possible. And for those without an existing footprint, our custom Layer-1 protocol is going to be super optimized to have the highest performance for storing hashes.

The last step is how to make the data accessible to others, especially in Web3. Web2 is easy, enough services have APIs set up that we can expose an endpoint and track the egress. But to make data available to be consumed by a Web3 smart contract is trickier. There’s a new pattern emerging amongst Oracle providers (ex. Chainlink, API3) that involves materializing data to smart contracts. Knowledge of how to navigate these new patterns is not widespread, and hiring a team of blockchain engineers is not desirable. That’s where Weavechain comes in. We’re building out endpoint integrations to connect to Web3 Oracles, Web2 common services, and more. Weavechain acts as middleware to Web3 just like the integration platforms before it: connect once to Weavechain, and then we’ll connect you to any Web3 endpoint.

Our goal is to ensure that customers can focus on their business, while we manage the infrastructure and plumbing that brings them into the future.

Why Big Data will migrate to Web3

Web3 data is more secure and more valuable than Web2 data (as discussed here), and those benefits are what’s incentivizing enterprise Web3 investment. On the security side, it’s all about the ability to establish attestations that are backed by cryptographic guarantees, and we’re seeing early interest in immutable forensic data and data lineage. On the value side, we think that data economies will unlock an explosion in profitable data collaborations. While we’re more excited about the value potential, I’ll start with security because it’s simpler.

Web3 Big Data is more secure

Remember the two security properties that Web3 data gains over its predecessors: immutability, and protection with a distributed defense. The question we asked security experts was, what are the data sets that benefit from these properties? Our diligence led us to build out Weavechain to support immutable forensic data like security logs, and immutable data lineage for proof of data movement and processing.

When a malicious actor breaches a network, often (73% of the time according to ZDnet) the first task is to cover their tracks by deleting the logs of their activity, an Indicator Removal on Host scenario. The result is that it takes longer to identify a breach, and even then it’s harder to figure out what happened and how. We can prevent this by using Weavechain to secure these big data logs using Web3 technology. The White House issued an executive order in May 2021 asking for logs to be hardened with blockchain technology, and we’re preparing to support their implementations.

Outside of forensics, there’s demand to better understand data’s lineage. With Weavechain, we create an immutable record of 1) when data enters the Web3 network (ex. via API or DB access), 2) every movement between Web3 nodes and smart contracts, 3) every time it’s transformed within the Web3 network, 4) every time it leaves Web3 via a Web2 API. I like thinking about this from a personal data privacy perspective first, where I want to know what data Facebook collects about me and where they’re selling it. In an enterprise context though, we’re working with financial firms looking to ensure the use of regulated sources, as well as with researchers looking to provide assertions that specific source data generated specific results.

Web3 Big Data is more valuable

Once you can establish trust in the security of data, how it moves, and who owns it, a world of collaborative opportunities opens up. We’re working with pioneers in the Decentralized Science movement (DeSci) to enable collaborative big data sets where multiple entities can contribute to big data sets while still maintaining enterprise level security. Combined with the use of verifiable credentials, we’re heralding an era that will end both isolated research environments and cumbersome access requests via Email to authorize access to an old FTP server. And even better, these techniques enable scalable personal data ownership, where individuals can decide at any time whether they want to share all of their data, anonymized data, or nothing at all.

Weavechain makes it easy to connect big data sets to modern access paradigms, including via Oracles for smart contract consumption. Activity in Web3 smart contracts is increasing rapidly, but legacy data providers don’t have the in-house expertise required to navigate the complexities of various Oracles. Weavechain has that expertise, and we’re actively working with financial market data providers who are looking to sell their exact same data products to Web3 consumers.

For that to happen, Weavechain also makes it easy to monetize data production and consumption. As discussed in my Intro to Web3 article, putting assets on liquid markets, making them tradable, and establishing incentives all increase the value and reach of that data. In addition to academics, and financial market data providers, we’re working with IoT companies looking to take the big data sets that their sensors generate and make them available for easy purchase. We’re going to support data marketplaces in maturing from their $1bn/yr niche today to the 15% of all decentralized services that Chainlink predicts will occur by 2025.

Weavechain is the on-ramp for big data to Web3, and for a foundational primitive like this the possibilities are endless.

What’s Next

We’re currently in a private beta period and are working with design partners on our first implementations of Weavechain. Our target is to move into a public beta by the summer of 2022, with a Mainnet launch in early 2023.

While we work on getting our MVP out the door, we’re having fantastic conversations with thought leaders and prospects about where they see potential for Web3 big data. Our architecture has been designed to satisfy today’s Web3 big data needs while establishing ourselves as a Web3 big data primitive for the future. A few exciting paths we’re exploring as well:

  • Confidential Computing: Technology is going mainstream right now that allows computation to happen against data sets without sharing the raw data itself. We love this, and while pursuing our vision of a world with strong data ownership rights, we want data to be replicated and exposed as little as possible. Combining fields like decentralized science with confidential computing furthers that.
  • Datasphere Analytics: As more data joins Web3, it will become securely accessible for large-scale, cross-company analytics. Especially combined with confidential computing, we’re seeing novel datasets emerge like global carbon emissions from vehicles through the MOBI consortium with their MOBI Trusted Trip standard. Weavechain wants to build the analytics layer on top of these secure Web3 data sets the way companies like Dune Analytics have done with public data sets.
  • Personal Data Brokerage: Once Web3 paradigms stabilize for enterprises, it will become easy for individuals to apply the same patterns for their own data. Eve Maler wrote a fantastic post on Personal Data Brokerage, and we’re going to support these efforts to make it so that you can be the owner of your data.

We’d love to chat with you about what we’re doing, what you’re doing, and how we can help you create your Web3 story. Our job is to navigate the complexity to figure out how Web3 can improve your work, and we’re confident that Weavechain can help. To connect with us,

Special thanks to Ioan Moldovan, Leticia Shaaban, Reilly Xu, everybody who helped with editing, and everybody who took the time to read!

--

--

Omar ElNaggar

Technology evangelist, Eagles fanatic, H+, cartoon/video game nerd. Excelsior!