Formerly: Graph DB Sidechain with Fluree
"GraphDB Sidechain with Fluree" was the title of the Fund 5 funded proposal. It got later renamed to "Logosphere" to facilitate project branding and marketing. Any further communication related to the funded proposal will be using the new name.
Since there was significant time passed since the proposal submission and the voting, a few things, such as timelines and some project affiliations have changed, but commitment to the goal of the original proposal remains the same. Therefore we only give a summary of the proposal, omitting some now irrelevant details. If you'd like to read the original proposal on IdeaScale, you can do so here; however you need an account on IdeaScale to view it.
Problem statement :
Off-chain DApp data storage options build silos that limit data's use across the ecosystem and lack comparable on-chain immutability/proof.
The metadata is not easily accessible to the users and 3rd-parties without having inspecting every transaction and lacks the ability to display the data partially with certain fields hidden. There are also an abundance of scenarios which surpass the 16KB limitation.
Leaving it up to each DApp developer to decide how to handle this issue will eventually result in the creation of a plethora of off-chain databases that will not adhere to any common standard. It will become very difficult to query data for analytical purposes, provide cross-app data sharing, and most importantly, expect any standards of immutability. This can be described by an oxymoron "Centralized Decentralized App" where some part of DApp will be on-chain and therefore trustable and immutable, but the metadata stored off-chain will not. This will eventually lead to an erosion of trust in Cardano ecosystem in general, which is to a certain degree already happening with Ethereum ecosystem with loud cases such as the one described in a CoinTelegraph article, where an artist was able to modify his NFT images of already sold collection without any permission from its owner.
Although there are earnest attempts by the Cardano dev community to make such situations impossible - one idea being to store the binaries in IPFS as a Merkle DAG where only a root CID of the DAG is stored in the transaction metadata, it will only work well for rather trivial use cases of attaching binary files and documents to a transaction. For non-binary metadata to be easily aggregated for analytics, extracted with minimum latency, used as training data for AI, or adhere to a standard, a different solution is required. This is why databases exist in the first place and not everything can be solved by a file system. As of now, there is no such way to do this without introducing unique, centralized DBs per scenario. This will severely limit the ability to query data spread across providers, companies, and industries rendering the data useless outside of the individual owner of such DB.
Describe your solution to the problem
Build a provable, data-centric sidechain for DApps with Fluree immutable graph ledger/DB that leverages W3C semantic linked-data standards.
This proposal is about building a data-centric metadata sidechain that will be powered by W3C semantic web standards and provide ability to link and share data across DApps all while guaranteeing the same degree of integrity, trust, and proveable provenance as the on-chain data.
This will address the 16 kb limit of transaction metadata in Cardano and will allow for the whole ecosystem of DApps built on Cardano to flourish with data that later could be harvested, analyzed and used for ontological reasoning by AI systems of the future.
We will provide an API to allow any DApp developer a universal way of storing extended metadata and guarantee metadata stored outside of the Cardano chain will be just as trusted as the Cardano network itself. This could be accomplished with a data-centric layer that will satisfy the following requirements:
Decentralized: There is no single entity that controls the data
Immutable: Public metadata is accessible forever with guaranteed immutability
Versionable (traceable): If changes need to be made to the metadata, new immutable version has to be linked with the previous immutable version
Shareable: Some metadata can be made private and shared with 3rd parties according to a security model
Queryable: Can be easily aggregated for the analytics use cases. Can be retrieved with minimal latency
Standardizable: The different DApps built in the same business domain should be able to exchange data if they choose to do so, according to a set standard.
Scalable: It has to scale indefinitely with growing adoption of Cardano and it's DApp ecosystem
As a candidate for the common metadata layer, we would like to present Fluree: a Web3 open source semantic graph database. We'll describe here why Fluree is a good candidate for the set of requirements identified above. Brian Platz, a co-founder of Fluree is a co-proposer and he is open to the community for any questions.
Fluree is a decentralized semantic graph database using a blockchain with ledgers consisting of blocks implemented as RDF++ triples called "flakes." This allows for Fluree to be used as a side-chain to Cardano, which could be a first step on a roadmap from Charles' Some Musings about the Roadmap video, where he dedicates quite a bit of time to side-chains (from 10:45), and even suggests that he'd love to see Catalyst implemented as one of such side-chains and for which we proudly respond with a vision of how it could be done with Fluree
RDF is a W3C standard used as a foundation for building ontologies and knowledge graphs since the mid-2000s, RDF++ is a Fluree's extension to the RDF model that adds a time and a boolean dimensions to subject-predicate-object triples.
In relation to the requirements listed in the problem statement Fluree is:
Decentralized: Fluree can run across a network of servers that participate in transaction validation. There are two types of Fluree nodes: transactor and query nodes that can be run on the existing stake pool operators infrastructure + new DApp infrastructure that will adopt the metadata layer (which also can be SPOs). There should be some incentivization model worked out, which for now we would like to leave outside of the scope of this proposal. If the community considers this question essential and worked out before the voting, we will work on this and respond with the ideas.
Immutable: Fluree combines transactions into immutable time-stamped 'blocks' and locks each block in via advanced asymmetric cryptography — making data completely tamper-proof. Traceability of changes are tied to digital signatures for complete proof and visibility into data lifecycle.
Versionable (traceable): Fluree extends the RDF triple model with a time dimension and treats any update to data as a timestamped new version. This allows for both issuing queries against any moment of time and for "time travel" with complete historical data visibility. In NFT-DAO Discord there was quite a debate about if transaction metadata should be mutable or immutable. Fluree's model allows it to be both: immutable at any given moment in time and mutate or evolve over time with complete traceability back to the original version.
Shareable: Fluree embeds privacy and security permission logic as metadata alongside RDF data at the source and implements a security model based on "smart functions" that are triggered based on conditions and return true or false for either a transaction to go through, or in case of a query, if a particular flake (unit of data) should be included in the query results. This allows for data sharing, cross-DApps collaboration opportunities, and to create a monetized data subscription model.
Queryable: For querying data Fluree supports industry-standard query languages SPARQL and GraphQL, and also FlureeQL, which is a Fluree own language based on JSON. It supports queries across different ledgers and network nodes, which makes it very powerful for AI applications that could build ontologies and derive semantic inferences across the whole DApp ecosystem. Fluree query peer can be embedded alongside your code and serve up sub-millisecond query responses. As a code-resident data source, Fluree can power no-fetch code with no downtime.
Standardizable: Because Fluree implements RDF W3C standard, it can natively support existing semantic standards created and evolved over time for a multitude of domains. This gives a huge head-start to domain specific use cases, such as NFT marketplaces.
Scalable: Fluree claims to be indefinitely linearly scalable as CDN (content delivery network) Watch this webinar to see how it achieves this.
As seen, Fluree proves to be a strong candidate for the standard off-chain layer that is still decentralized. This solution also caters to any scenario where some data or fields contain private information and should not be viewed publicly. Fluree is the perfect choice because it also allows for creating private ledgers that allow splitting data into a completely secure part that resides outside of the public decentralized network. To make things even more exciting, the private ledgers can still be linkable through multi-queries with the data stored in the public ledgers as to have the best of both worlds. In addition, once in Fluree, it can be stored permanently, so there is no risk of referencing deleted data.
We see this off-chain layer working as a side-chain to Cardano, with hash anchors stored in the Cardano transaction metadata that would be pointing to the root of the knowledge graph in the data layer side-chain, similar to how it's been suggested to be done with IPFS for binary files but with a difference that in this case data will be linkable, queryable and shareable.
In addition, Fluree will facilitate front-end web development of DApps with Fluree-React library. One question asked in the comments was about how Fluree compares to other distributed databases, such as OrbitDB or Cassandra. These systems are key-value stores, suitable for storing and retrieving large volumes of data, but they neither have support for semantic web W3C standards, nor organize the data into tamper-proof blockchain ledger as Fluree does. Out of all the open source DB solutions, Fluree is the only one that has been built with DApp most prominent features in mind: decentralization, traceability, transparency and proof of provenance. Therefore it is currently the most suitable solution for building data-rich DApp ecosystems.
Implementing a semantic data-centric layer as a Cardano side chain, will open tremendous opportunities for Cardano DApps ecosystem, potentially making it competitive with more specialized blockchains, such as Flow, VeChain, Ocean Protocol and ChainLink. All of these projects have data-centric on-chain architecture with ability to share data between DApps, but none of them are using W3C semantic web data-standards as far as we know. Fluree's commitment to the W3C standards serves a key differentiator from these chains and opens up opportunities for data exchange not only within the Cardano ecosystem, but across the other blockchain ecosystems as well.
Another important aspect of this project is that it can significantly contribute to further decentralization of the Cardano blockchain by providing additional incentives to stake pool operators to host side-chain nodes and receive rewards from data subscriptions. Currently, the majority of small SPOs don't produce blocks and therefore don't get any rewards and have to cover expenses for running the stake pool infrastructure out of their own pocket. This can hardly be seen as sustainable and could potentially lead to problems of small SPOs leaving their business in frustration. Giving small SPOs opportunity to host side-chain nodes for a reward can be a good incentive to keep operating and contributing further to decentralization of Cardano network.This has been brought up in the comments by Roberto Carlos Morano from Gimbalabs, who has a vision of creating a bundle of APIs and side-chain nodes for SPOs to host. We intend to collaborate with his proposal by including Fluree side-chain node package into an easily deployable bundle.
All the components of this solution will be released under AGPL open source license. It's the same license which Fluree is licensed with and different from Apache 2.0 license that it prohibits the software to be released by 3rd parties 'as-a-service'. This will function as a safeguard against centralized platforms to acquire the software and release centralized solutions on their own terms.
This will be a true open-source initiative open to contributions from anyone who feels motivation and shares the excitement for this project.