Website as a digraph

Google works (at least in part) by treating the entire internet sa a giant digraph.
Are there any utilities that will let me take a small website, such as for a math class, and automatically generate a digraph for that domain?

I bet you could cobble together a utility using a Perl script on the output of ‘lynx -dump’. The lynx browser, when called with the -dump flag, produces a text file rendering of the URL supplied as its argument, and you can scan that output to start looking for the targets of <a href> tags, which are neatly listed in a separate ‘References’ section at the end of the output. Any target URL pointing to a different domain name could be discarded, while internal links are kept and used to increment the appropriate entry in the adjacency matrix.

For reference, here’s what the word ‘digraph’ means:



  Digraph \Di"graph\, n. [Gr. di- = di`s- twice + ? a writing, ?
     to write.]
     Two signs or characters combined to express a single
     articulated sound; as ea in head, or th in bath.
     [1913 Webster]


The necessity of distinguishing between external URLs and internal URLs is why I suggested using the intermediate step of ‘lynx -dump’. Although a scripting language could be told how to analyze the HTML files directly, in practice the usage of relative and absolute references is so inconsistent as to make the task at least as demanding as rewriting all references in their absolute versions and weeding out the ones pointing to different domains. This task is already something that lynx does, so you don’t need to reinvent the wheel.

To Derleth: in graph theory, mathematicians use the portmanteau ‘digraph’ to save themselves the need to pronounce all the syllables in ‘directed graph’.

This is new to me. Never let it be said that mathematicians are shy about speaking their own language.

There’s also the lovely DAG (pronounced like it’s spelled, not each letter individually) which is a Directed Acyclic Graph.

Granted every field has its jargon. I can’t count how many times I’ve heard in physics classes “We call this <x>, now keep in mind that this is confusing because <x> doesn’t quite mean what you’re used to it meaning.”

Of course, there’s still the issue of exactly what you want to include. Are links to their ftp server game? What about the image URLs, they’re treated as pages, but most people wouldn’t consider them that.

Username/anagram topic?

Not really sure what a digraph is – still not clear after reading the other comments and the wikipedia page – but do you just want a visual site map?

If so, maybe something like PowerMapper.

Old versions of Dreamweaver and FrontPage used to be able to do this as well; not sure if they still do.

A graph is a set of vertices (or nodes) with what are called “edged” (paths) between them. Thinking of a node as a room and an edge a door, two nodes are connected (or “adjacent”) when there’s an edge between them.

A digraph or directed graph is a graph where the edges only go one direction, if you imagine two floors of a building as nodes, an escalator could be called a directed edge. Another example is a flow chart, since to get from one point to another you follow a one-way arrow on certain conditions.

So what he’s asking for is at least similar to a site map. A page is a node, and if there’s a link to another page, there’s a directed edge TO that page, but not one FROM that page unless there’s an explicit link backwards.

ETA: An undirected graph might be A-B where if you’re at A you can get to B, and vice versa. An undirected means A->B is not the same as B->A, in the former you can get to B from A, but not vice versa and the opposite for the latter (but note that A<->B is possible).

So it’s a fancy term for an organizational chart with arrows? Like this one? File:Main Page Usability.png - Wikipedia

If the directionality of those arrows is really important to you, you might want to ask the maker of that chart how he made it.

While it can be used as an organizational chart with arrows, graphs have some nice mathematical uses (especially in computer science, even the common tree is a special case of a graph and using graphs in an algorithm can greatly simplify things like calculating the computational complexity). I infer that the point of the project isn’t mapping the website (for purposes of web design or data mining), but rather illustrating the concept of a digraph to a mathematics class in terms of something people use every day to make the subject relatable.

Also, it seems from the description of that picture that the user might have hand made it with graph drawing software.

Probably. In the corner it says powered by yFiles.

The website says it can take xml files as input and automatically arrange a chart, so if you produce an xml sitemap you might be able to do it that way.

ETA: And thanks for the explanation about graphs. Now I remember why I never want to go into mathematics :wink:

Graphs aren’t that scary ;). For instance, this forum game is actually a game about making a digraph between two Wikipedia pages. They’re very simple and intuitive, and everything “complicated” about it follows naturally (if not always intuitively) from the basic premise of “you have things and a way to move between them”. You have concepts like path (how to get from point a to point b – which is what that thread is about, identifying a PATH in the DIGRAPH Wikipedia), and things like path length, shortest paths, acyclic paths (does a path contain the same node twice?), acyclic graphs (are all the possible paths acyclic?), and such. It’s certainly not vector calc – it’s a pretty natural way to think about things.

It both impresses and terrifies me that there are people who find this stuff exciting and fun :wink:

I am thankful for the work mathematicians do in this world, even though I haven’t the faintest idea what they actually do. I’m sure they improve physics and computers every few years, and that’s good enough for me. I’m glad I’m not one :slight_smile:

I assume you’re talking about the fact I linked to a game “about” digraphs? Nah, if you click it’s that game where you get two random Wikipedia pages and the goal is to find the shortest amount of links you need to click to get between them. Graph theory can be interesting, but constructing a digraph is almost universally considered pretty boring. It’s like comparing writing a novel to spelling drills, the latter is useful and necessary for the former at some point, but it’s hardly the interesting part.

Thanks for the help; sounds like what I want doesn’t exists as-is, but might be makable with a little ingenuity. For the record, I’m working on a math text; we’d like to take the course website and make it into an illustration for the digraph chapter, but we don’t want to have to track a bunch of links by hand.