Who Are You Anyway?

· July 5, 2013

Social sign-in adds an extra twist to sign-in on the web. While systems like OpenID are often used purely to assert identity (e.g. you are the same person as when you came here before), OAuth and OAuth 2.0 were always about granting access to data (e.g. you give me permission to know who your name and friends). While both of these get to the same place for most developers - someone can log in, and you can reliably and securely know which user in an application they map to - the difference is largely about what other data is available.

Most social sign-in systems grant access to profile information, such as name, gender, email address, age or age range, and other more specific information. They often also grant access to a users activities on the identity provider, either explicitly or implicitly: for example if I sign in with Google+ you can retrieve a list of the people I have circled (or at least the ones I have given you access to), or if I sign in with Twitter you can easily get a list of my tweets. These are powerful extra capabilities, and allow relying parties to customise the user experience to better fit their users - and hopefully give them a richer, better tailored experience.

The challenge with this tailoring is that we are always only looking at a single facet of the users personality at the time. If we retrieve any actions from a social network, we’re only going to see the kinds of things the user wants to share or perform on that network. Sometimes the distinctions are fairly predictable - it’s easy to imagine a given person might have more professional interactions on LinkedIn - but many other cases may not be so obvious, and will depend on the kind of friends and connections people have on various networks.

This is one of the reasons that giving users choice of identity providers can be valuable, and why it can be tricky to ask users to connect multiple providers: there may be a mistmatch between their usage and viewpoints of the different services.

One of the interesting side-effects of Google+ Sign-In is that it is easy to add scopes for any other Google service, and there are enough of them that you can actually hit these same issues of tailoring content depending on the services you have access to.

As a quick experiment, imagine we were suggesting interested based on activity on a sign in provider. This is a common case for many networks that want to to suggest content streams to follow. We might do this by looking at two easily accessible sources - my Google+ posts and my YouTube likes. In this case I haven’t done anything particularly clever with either - just grabbing a pageful of results, and looking for the most popular nouns.

My YouTube likes have words like: google, web, rancilio, app, silvia, i/o, prairie and dogs. My Google+ posts: google+, sign-in, google, developers, php, people, version. Even in this little sample for two different services I use quite closely together, different things are appearing - my Google+ stream clearly has more work-related posts (Google+ Sign-In for example), while my coffee machine makes an appearance in my YouTube list*.

This means that if an application was trying to form a picture of my interests in this, it might fail. Being too general can mean missing out on the compelling content that gets me to use the application, which is exactly what we try to avoid by building these types of systems. It may be best to try and look at just one source, and focus on key interests from there.

If you want to try it, you can do so here - it’s just some Javascript, so there’s nothing that will Snowden your results or similar.

In case you’re interested what the code is doing, we first need to request access to the YouTube readonly and Google+ login scopes by setting our sign-in button scopes parameter to “https://www.googleapis.com/auth/youtube.readonly https://www.googleapis.com/auth/plus.login”. Then we can query the relevant apis: plus.activities.list for Google+ posts, and playlistitems.list for YouTube, though we have to call channels.list for the user first to get the channel ID for our list of liked videos. We extract out the title and description for each, and send it to our processor:

Finally, we take advantage of JSPos, a simple part-of-speech tagger written in Javascript. Part of speech taggers attempt to assign each word in a sentence to a grammatical part of speech: verb, noun, adjective and so on, based on a list of mappings and some transform and positional rules. In this case, we’re using that on each sentence we get. We’re only looking for nouns, to give us a rough summary of what the text is talking about:

Finally, we count those up, and emit the top most common:

When creating sign-in and requesting access to services, allowing users to control what is being considered and processed can be very helpful. For an interests based example as above, before signing in with Google+, the application could ask the user if they would also like the app to extract interests from YouTube.

If the application used multiple functions from a provider - such as extracting interests but also finding friends - offering users the chance to just take one function could mean they would be more comfortable connecting accounts. Again, the application could give them an option when connecting to say “find friends” and “tailor content”, and allow users to choose either or both. That way the user gets to stay in control of which face they present to the application, but the application gets the benefit of connecting the accounts.

* As does a video about prairie dog language which happens to have a particularly long description. This is why scoring is generally done with something slightly more sophisticated than term frequency!