PayPal and the Next Victim of the Web: Banking

Paypal-x-475x133

It's old news that the Music Industry had to completely reinvent itself
around its core business, and I'm not even sure they have found a
solution yet.
The disruptor? The Web.

The Web sole-handedly brought a complete, established industry to its knees.
Next in line was the News and Publishing industry. Again, very scandalous.
And now, we can aleady see the red dot on the forehead of the next
pair of victims: Banks and Retailers.

If that sounds too far fetched for you, remember that neither the Music
industry nor the News industry perceived the size of the threat they
faced until it was upon them, and things are going even faster today.
You need to take a mental leap in order to see how it will happen in
concrete terms. And the reason it is so hard to see is because this is
the result of a complex system with many converging dimensions.

But the endgame is this: Information will flow freely following the
path of least friction. And that includes money of course.

But thinking in such abstract terms leads you nowhere. The trick is
trying to layout the roadmap: How will this actually happen? What will
happen first? Is there an opportunity for me in any of these
mini-earthquakes?

I recently gave a talk about this and lots of interesting facts and
observations appeared as to which concrete changes would be the first
indicators of a major, subterranean change. Now, after watching the
PayPalX innovate event, I get that feeling that most of the "will
happen" milestones are now in "just happened" status.

* Business Payments over PayPal with small fees
* Some sort of Facebook integration
* Simplified/unobtrusive experience 
* Compelling, radical examples developed atop PayPal API
* Micropayments
* Micropayments
* Micropayments 


And I emphasize Micropayments because they imply a deep cultural
change. Once you learn to pay, say 50cents for small things, then a
complete new "instant" economy is born: The Attribution Economy :)

So most of this is already there. With real world use-cases.
Boy was that fast...

But it has to be. Money is a TERRIBLE solution to a real problem (
exchanging goods and services ). Bad solutions get replaced as soon as
a viable alternative is found.

The beauty of all this is that, from a certain level of abstraction,
all the industries that are being negatively affected by the web are
about tapping into some kind of information flow ( music, news, money,
offers ).

* Music and Digital content: Check
* Text Medium based content: Check
* Offers and Commerce: In progress ( we need more structure. Web 3 is
doing its magic here )
* Money: In progress ( we need just enough momentum to overcome
cultural and regulatory issues. PayPal is upping the ante here. Turbo
speed )

The flipside of this is that there is huge opportunity to innovate,
and this also applies to the current behemoths. But. If history has
taught us something, it has taught us that their size makes them blind
and too slow to react.

I give Banks, say, 4 years before they enter the panic phase.

If you haven't already, I highly recommend you go watch the recorded
live stream from the PayPalX innovate event:
www.ustream.tv/paypalx

How do I enrich my (Linked) Data with DBPedia?

This is a question I have heard a couple of times.

( warning: this is a dense technical blog post. It was an email but I decided to post it here. Sorry in advance ).

Well, there are several ways of doing this. But, conceptually, you need to understand that there are three distinct steps in the process, each of which can be accomplished in several ways. ( Note: I use DBPedia as a "toy" example but in real life I use this to create BI and EDI workflows for real life apps ).

The 3 steps are: Linking, Importing and Querying.

 

== Step 1 ==

Linking ( or aligning ) your data with DBPedia's

This is generally about mapping your URIs to those of "equivalent" concepts in the DBPedia namespace. The result of this process is usually a set of owl:sameAs triples, but they could be another sort of alignment ( ABox or TBox ).

More sophisticated tools like Google Refine can give you a hand when it comes to mapping large datasets.

There are other approaches as well.

Finally, for simple literal ID alignments ( like EAN-UCC, SKU, ISBNs, etc ), a simple SPARQL query using string comparison heuristics will work just fine.

Note to self: Keep an eye on Geospatial Alignment tools.

 

==Step 2 ==

Once you have created one or more owl:sameAs relationships or some other kind of alignment data, you most certainly want to exploit the result by issuing queries that consider the sum of both datasets. Again, there are several options here and the final strategy will depend on your queries, desired response times, etc. The main factor here is to figure out if you will load some fraction from DBPedia into your system and, if yes, how much of it.

Let me walk you through an example.

If you just want to enrich certain entities ( for example, stealing the labels for cities or for music bands ), then the cheapest and easiest way is to insert only those triples.

One common way to do this in Virtuoso is to take advantage of the built-in sponger ( which is a fancy name for a Linked Data adapter ).

Here's one technique that works pretty well. You can use this for an infinite number of scenarios.

The following is Virtuoso SPASQL, but it could be written as pure SPARQL HTTP as well.

sparql clear graph <XXX>;
sparql define get:soft "soft" select * from <XXX> where { ?s ?p ?o } ;

Where XXX is the URI of a SPARQL construct query.

Say what?

OK. Let's slow down a bit. Suppose you want to retrieve all the labels and descriptions available for Paul Mccartney. You can play around with SPARQL in DBPedia and you would probably come up with something like the following:

prefix res: <http://dbpedia.org/resource/>
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?label ?abstract where { res:Paul_McCartney dbpedia-owl:abstract ?abstract; rdfs:label ?label  }

( Go see an HTML representation of this query's results here  )

Nice. That's the data you want to add to your app. But how do you store query results into your local Quad Store?

You don't. What you want to store is not the "tabular" select query results, but the data itself. The Triples. Or subgraph if you wish.

No problem, SPARQL construct to the rescue.

prefix res: <http://dbpedia.org/resource/>
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
construct 
{ res:Paul_McCartney rdfs:label ?label; dbpedia-owl:abstract ?abstract } 
where
{ res:Paul_McCartney dbpedia-owl:abstract ?abstract; rdfs:label ?label  }

Now, if you run the above query in the default SPARQL endpoint UI for DBPedia ( http://dbpedia.org/sparql ) you will get back a N3/TTL file containing the raw triples for you to insert.

Click here to try it ( Note: your browser will most likely ask you to download a file. Accept and then take a look inside... Yes! triples! ).

The uncompressed URI for the file is pretty long, but it contains your query and some other parameters, effectively exposing a REST API for SPARQL execution ( this transparent HTTP magic is, in fact, an integral part of SPARQL ).

FYI, the URI looks like this ( I remove prefixes to save some space ).

http://dbpedia.org/sparql?query={{prefixes-go-here}}construct+%0D%0A%7B+res%3APaul_McCartney+rdfs%3Alabel+%3Flabel%3B+dbpedia-owl%3Aabstract+%3Fabstract+%7D+%0D%0Awhere%0D%0A%7B+res%3APaul_McCartney+dbpedia-owl%3Aabstract+%3Fabstract%3B+rdfs%3Alabel+%3Flabel++%7D&format=text%2Frdf%2Bn3

Notice the "format=text/rdf+n3" parameter at the end.

So, remember that XXX placeholder above? This is the URI you should use there. Virtuoso will then perform a GET request, download the file, figure out it is a valid RDF serialization and insert it.

Of course, manually composing such a query for each and every one of your aligned resources is a PITA. So I would suggest building a simple script or stored procedure that does the work for you.

OK. I hope you are not too dizzy by now. This example may seem like overkill at first, but if you think about it for a while, we are actually doing something very simple yet powerful here. We are taking a subgraph of a remote RDF dataset and "importing" it into your local environment. This is unique to Linked Data due to its use of URIs ( no collisions ) a triple based KRF ( no need to create tables, just add data ) and, finally, its transparent use of the HTTP protocol.

Hopefully you can build on this to come up with more complex workflows. I have built really crazy things using these simple pieces. Of course this is not the only way, in fact there are thousands of combinations and tools at your disposal. Some ideas include:

  • Downloading TTL files using wget, scripts, etc and loading using stored procedures ( faster in some scenarios )
  • Syncing to/from a remote Graphs using Virtuoso RDF Graph Replication feature ( this is very robust and efficient as it is based on time tested and industry-standard SQL replication functionality and uses an optimized "changeset-based" messaging protocol )
  • Downloading a complete DBPedia dump and loading it completely into your machine. This can be used during the alignment process, for example, to run demanding queries, etc.
  • Using Federated SPARQL when it becomes available ( basically, forget about importing, just use multiple endpoints and let the SPARQL engine do the guessing ).

 

== Step 3 ==

Of course ;)
I almost forget.

If the result of your alignment was a set of owl:sameAs links, you should remember to turn on Virtuoso's owl:sameAs inference:

sparql define input:same-as "yes" select * where ...