2021-08-07 | Collaboration | Triple Specification
[ 20210807: I am refining this, bootstrapping, getting to base camp, setting up, etc., bringing up the sites, then the specification, then the model, then the active demo and MQTT streams, back and forth, so it is rough and changing quite a bit right now. I'm writing this here, both so that anybody following along can see the progress, but also for myself, as I'll browse my specs as I work on the other sites. ~Aggie ]
Triple Pub is a highly constrained specification that facilitates collabarative creation and visualization of knowledge graphs of various domains and systems.
The triple is an atom of knowledge. Each semantic triple has three components, subject, predicate, and object. The rest of the world, the graph and domains, are fluid, and built with triples.
📚📓🛖🪧Mud Hut Club🔗https://mudhut.club
📚💦👴️⚗️6️⚗️1️🪧Snail Mail Import
📚📓🛖💤✍️2️#️2️#️🌄🪧Aggressive Green Curry.
I need to backfill below with this example. It is much simpler and concise.
The universe of this specification starts at the origin 0. The first predicate from the origin is 🏺. This means that 0 contains something. The origin contains integers. The entire semantic framework of triple pub is based on this relation.
0 is only used to signify that it contains integers that represent other categories. The last 0 from a tree perspective, when navigating a path is a graph with 0 as the origin. Note that this mechanism means that the entire original universe with origin 0 could be fork-lifted into another universe with origin 0 and treated as a graph with with a sub-origin of 0.
Think of any 0 as a universe in this sense. If we had one model, a data flow of a company, the universe would start at 0. Perhaps we do a data flow for another company. Our universe can be expanded by creating a new origin that contains items called DFDs. This item contains instances of DFDs that are integers. Those integers contain root 0s that are graphs with 0 as the origin that are the full DFD.
An origin never includes another origin, just categories/types. The categories contain an origin for a sub-graph that is a 0.
Three levels up, or the first 0 from a final 0 in a path is the domain. Let's use the example of ACME. If ACME was the universe, and 0 was both origin for the graph and the last 0, then it is also the first 0 in the other direction. The domain would be 💦. If we add another data flow for a different company, then we would have another origin 0. This origin 0 would be the category of DFDs, also of the domain 💦. One company would be ACME and one of a different name.
The non-zero integers name the universe for the sub-graph with a 0 origin under them and contain the root 0 for the graph.
The entire graph is filesystem based so that it can be checked in with git. Relative symbolic links are used to connect branches and leaves. The nature of these links varies by domain. At any particular level there can be an unlimited number of other entities and predicates. In the specific case of a DFD, the only other entities are data at rest (data stores) and "entity" (We have simplified Gane and Sarson notation to not distinguish external entities.)
The domain symbol and the graph place the triple.
Here is the full entry with a domain, using:
domain graph subject predicate object
Here is how a hybrid document/DFD looks at level 0:
💦 1 6 ↔️ 7
💦 1 6 ↔️ 8
💦 1 8 🏷️ Building
💦 1 6 ↔️ 9
💦 1 9 🏷️ Troubleshooting
💦 1 7 🏷️ Troubleshooting
💦 1 9 🏷️ Operating
💦 1 6 ↔️ D1
💦 1 D1 🏷️ 1980s-1990s
💦 1 10 ↔️ 11
💦 1 11 🏷️ Automation
💦 1 10 ↔️ 12
💦 1 12 🏷️ Replay
💦 1 13 ↔️ 8
Consider this triple with a graph and domain:
💦 1-1-2-5 7 ↔️ D2
We don't need the 0 in the path, because 💦 contains an origin which contains instances.
1-1-2-5 is three levels down (1 is a subprocess of 0 in the graph 1 💦 instance. 2 is a subprocess of 1. 5 is a subprocess of 2. At that level (1-1-2-5), a subject would be a process, say 7. 7 has a two-way data flow with a datastore D2.
The hybrid DFD documentation is a bit odd, because the model is built primarily with data flows and not with topics and sub-topics like documentation is usually done. Think of it like a giant Christmas Tree forming the DFD and the documentation draped around the branches.
In the case of a journal, another type of documentation, the domain is different, because the primary meaning of the triples, particularly the container is different. A container is signified by 🏺 is used to zoom into levels. For a DFD this id done on process. For documents this is done on sections, and the domain is📓. For infrastructure the physical location is the zoom (assuming on-prem in this case, and uses the domain symbol 🏘️. Finally, the DNA, the code and configuration used to construct the infrastructure, whether cloud or on-prem is signified by 🏗️.
For 🏗️ We are focused on the DNA for a GNU/Linux system compiled from source, although this could be easily modified for other systems. Dependencies the structure of the graph for the domain. Note that this is essentially DevOps and CI/CD, which takes DNA and expresses it as infrastructure. A core feature of formal systems like this is deailing with dependencies. We are taking this on at triple.pub to illustrate how simple this can be and consistent with the other models; however, it is unlikely that it is prudent to break apart existing CI/CD for your org. OTOH, if you would like to build some of the visualization tools on System SA Net, the DNA specification covers you end-to-end without needing the rich and complex ecosystems associated with implementing this. The graphs focus on dependencies. The start of the graph assumes a set of packages with a Linux kernel. There are still dependencies within these as far as features go. For instance, Freetype, Harfbuzz, and Pango could be operational enough to load up an X windows kitty terminal, but if there is a change, some of the supply chain needs to ge recompiled, and the graphs map that.
The predicate is (has_X at rest on the filesystem)and the predicates are the prefix in thise files:
The graphs in the 🏗️ domain zoom in on dependencies. All parents must be completed before the child.
This includes configurations. For instance
🏘️ 1 2 📛 192.168.52.51
🏘️ 1 2 🏷️ srv-19.example.com
There is a convention on full paths. Say we had a datacenter 1 in India off of 0. In datacenter 1 we had a rack labelled 2a. The server was labelled srv-3 This could be designated with these triples fully:
0 🏺 🏘️
Any 0 by itself is origin This can be extended, but for our purposes it covers all defined domains. 🏘️ is_graph 0-2
🏘️ type infrastructure
We are defining a symbol that is type infrastructure. Now we know that origin includes a type called infrastructure. We also know that we expand the graph on physical location.
🏘️ 🏺 0-2-0
We are setting up another base for multiple instances of infrastructure. This allows the schema to be extensible. As of this writing, we are only putting on-prem into this sub-origin. We are expanding on location. We will label all graphs with domain of 🏘️.
0-2-0 🏺 1
We are now at a root of a graph that is published as a set of documentation, or website. In this case, as of this writing, the intent is to use arewedown.com as an example. It will have machines built to support IT Docent. Yet again, we will start at 0, but this time we include abbreviated paths because we are at an instance of 0-2-0. All this means is that https://arewedown.com/1/1/1/ could designate datacenter 1 in India, rack 2a, server srv-3 if we had these triples:
🏘️ 1-1-1-1 🏺 1
We've added the domain as both a shortcut, but a mnemonic. The full path would actually be 0-2-0-1-0-1-0-1-1-1-1 off of origin for the server. There is a rough map that shows this here.
The first 1 in the graph is the instance of 🏘️. If we had cloud services, and we wanted to map all cloud services for our org, we could use 2 as the first number. (Believe it or not, if you got formal about this it would be much more complicated. We are taking many shortcuts here, but making sure it is consistent and can leverage existing tools.)
We would have to add labels:
🏘️ 1-1 🏷️ India
🏘️ 1-1-1 🏷️ Datacenter 1 🏘️ 1-1-1-1 🏷️ Rack 2a
For notifications, if you are willing to enforce uniqueness, you could use the labels as the source. Normally, alarms look something like this:
🏘️ 1 2 🗝️ "cpu%"="100" "memGBfree"="1"
This is how KVPs are passed with a host triple. Note that most metrics likely come through with OOTB software like Telegraf; however, the hostname is common. The host name is the name associated with 1 2, above, and is the same for bots or package installation.
A full package designation, then, would be something like 1-2-1 within the 🧬 domain. A convention is that the last integer is the node. In this case, the definition of the package would be attached as predicates using has_command, has_title, etc. In our particular case, commands are all shell commands, so we already know that they can be displayed. Another convention of the way we are implementing triples, is that the object is parsed by location with spaces, so the object can technically have white space. YMMV. As we will say many times, this is all an example, an approach to analysis using triples. It is not a product. It is expected that anybody using this will adapt the ideas for their own domains. The core idea is that if this is simple enough, that should be easy to do and will contribute to the resilience of systems over time in response to crises.
DNA in our case is built from free software available for download in the internet. Off of 0 is a directory called files with a subdirectory of the filename that has details and resources:
Any number of X.txt file can be added. The priority is to try 1 first. Sources could be a mounted directory on host as an example in 2.txt.
🤖 is a domain focused on applying DNA
Robots listen for commands in triple format. For instance, if I publish:
🤖 0 1-3-1 ➡️ 2-4 This command means I'm installing package 1-3-1 on 2-4.
A robot domain is about completing jobs.
🤖 0 1-3-1➡️2-4 ✅ 20210404T010024.213Z
This means that the robot complted the job successfully at a specific time. (We will address the logging later in this document and the difference between a logged-at timestamp vs. the time the robot perceived applying the command.)
🛑 predicate means the job failed
📤 predicate is the output of the job. This can get to be quite large, but in our simplified world we are going to pass the entire output over the local messaging bus (MQTT). In our case we are storing the last full output on the filesystem rather than storing the triple.
These three can apply to an edge (predicate), subject, or object:
📄 = narrative
💬 = short comment (often the hover tooltip)
🔗 = href link
🏷️ = visible label
Triple Streams have additional fields of signature log level, timestamp, and human readable name:
signature ℹ️ 20210404T010024.213Z aggie 🤖 0 1-3-1➡️2-4 ✅ 20210404T010024.213Z
Consider Cruft Buster for the document domain and relation to the DFD.
🖼️ must be text defined in markdown format for an image
The text string \n is used for a new line in a label. This is compatible with Graphviz labels, as well as other semantic tools.
If there is no triple, then there are no nodes that are disconnected. This makes sense from a data flow perspective. A 🚫 means that any existing predicate between the subject and object is void.
|⬅️||from||1 <- Sales|
|➡️||to||3 -> 5|
|↔️||to and from||4 <-> Accounting|
|🏷️||has label||1 = Accounts\nReceivable|