Cybersecurity Risks of Web 3.0
/(an edited version of this post first appeared in Security Magazine)
A new web for the Internet brings great promise and great risks, but we can't manage those risks until we define what it is.
What is Web 3.0?
You can't secure something if you can't describe it. The original "web 1.0" was a place to serve static pages built by companies. Along came forums and social media, and we suddenly had a "web 2.0" in which users created and added content. Tim Berners-Lee (inventor of web 1.0) coined the term web 3.0 to mean a web based on data that machines could process, not just people. If web 1.0 created an encyclopedia, web 2.0 was Wikipedia, and web 3.0 would make everything on the web into a massive database.
How would it get used? In a word, AI.
Why a machine readable web matters
AI eats data, and the promise of web 3.0 was to make all of the web into consumable data. That would provide a massive AI training set, most of which is currently inaccessible "unstructured data". The result could be a step function in AI capability. Imagine a Google, Siri, or Alexa search that was able to use all of it. Today, if you ask Alexa a question, it might respond with "According to Wikipedia..." and read a web 2.0 article. In the future, it could understand the meaning of everything online and provide a detailed answer.
Broadening web 3.0
People noticed that the trend was to "decentralize" the web. Web 1.0 served up content controlled by companies, and web 2.0 is platforms controlled by companies hosting user-created content (e.g.Facebook). Why shouldn't web 3.0 provide a new platform for content to be added without a company controlling it? Simultaneously, blockchain emerged as a way in which anyone could post a transaction that would be validated and accepted by the consensus of a community instead of a platform owner. Those uncomfortable with the control of web 2.0 platform owners over content suddenly envisioned user content on distributed and decentralized platforms.
Is that a redefinition of Web 3.0? Not entirely. What Tim Berners-Lee described was a web with inherent meaning, which focuses on how data can be consumed. The new definition of a decentralized web focuses on how data gets added. There is no conceptual reason why both can't be right at the same time. I propose that web 3.0 is a platform in which anyone can add content without the control of centralized gatekeepers, AND the content has meaning which can be interpreted by people and machines.
Cyber risks
While the vision sounds amazing, with details to follow, there are concerns. Cyber security practitioners should be nervous about a poorly defined web 3.0 for a number of reasons.
Quality: Web 1.0 relied on the reputation of publishers to be accurate. Web 2.0 lowered data quality, and a lot of online information is just plain wrong (look at all the incorrect posts about Covid or elections). Will the consensus to accept data in web 3.0 include accuracy checks? Who gets to make the decision, what are their qualifications, and what motivates them to be fact-based instead of promoting an agenda?
Manipulation: Intentional manipulation of data that will be used for training AI is a huge concern. People can create bad data to manufacture the results they want, making AI the worlds biggest disinformation system. When Microsoft decided to train their chatbot "Tay" by letting it learn from Twitter, people intentionally sent malicious tweets that trained it to be racist. Imagine what a nation state could do to disrupt things by feeding misinformation data or by changing the meaning of words. How will we find, block, and remove data that is designed to deceive?
Availability: If our systems depend on data, what happens when that data is unavailable? The web today is full of broken links. Machines will either need to make local copies of everything on the Internet or go and fetch stuff on demand (web 2.0 is on demand). This could increase our dependency on the availability of systems we have no control over.
Confidentiality: There is a lot of content online that was accidentally released, often sensitive data stored in publicly accessible folders. In most cases, nobody notices. With machines scanning and including that data in their knowledge base, we suddenly increase the likelihood of private data not just being found, but actually being used. Do we need new ways to prevent accidental release and misuse?
Those are just a few of the issues, more will likely arise as web 3.0 takes shape. Still, it makes sense to consider solutions to privacy and security from the start.
The future of the web without gatekeepers, holding content meaningful to people and AI, sounds like a dream come true. We need to design in security to keep that dream from becoming a nightmare.