Technical Working Paper - InterMix API | |
|
7 Mar 2004 @ 21:54, by Roger Eaton
A previous article, A New Heaven, has an overview of the voice of humanity (voh) concept.
InterMix is middleware; it can be thought of as an "engine" or black box whose working details can be ignored. The Application Programming Interface, or API describes exactly how programmers can connect their applications to the InterMix engine in order to participate in the voh network. This is where we begin to bring the concept down to earth and it has to be done right. Our goal is to keep the interface simple enough that web scripters, of whom there are millions, will feel comfortable using it. As you will see, this article is very much a rough draft.
All InterMix API functions will be invoked through http or sometimes https -- "no password in the clear". Using http provides a great deal of flexibility, allowing an InterMix engine to be invoked remotely as well as from a program running on the same box. For GET/POST usage, we will follow the W3C advice in the URIs, Addressability, and the use of HTTP GET and POST page. I am not clear whether we should use http PUTs and DELETEs for item updates instead of POST.
Input and output are in simple XML format. Until the time comes that we actually need to be 100% sure how some valuable resource, such as money, is being handled, we will not be using SOAP, and we are giving XML-RPC a bypass, too, because it seems needless.
Or am I off base, here, in taking what appears to be the simple route? For background, see Sam Ruby's REST SOAP article. In particular I wonder if it wouldn't make sense after all to implement an XML-RPC interface. XML-RPC libraries are available for all the major web languages, and their use might actually facilitate translating the InterMix interface calls from and to the programmer's language of choice instead of making things harder. Could that be right? Advice on these matters would be most welcome. Here is a recent entry from the Atom API group - they have decided not to use XML-RPC.
Output from InterMix always includes a "return_value" and a new "stateID". The return value has two parts, first an ok/not_ok flag, and second a list of message_number / message_text pairs.
InterMix saves the important state variables (such as who the user is, or what list of items is being worked, or what the current item is) in association with the stateID. In the normal course of traffic between InterMix and the user interface program that is calling InterMix, each request includes the stateID from the previous InterMix output. InterMix supports the use of even earlier stateIDs, so as to allow intelligent handling of back button use in browser and browser-like implementations, such as Chandler. Each stateID w/ associated information is saved by InterMix for a hub-settable number of minutes, default for public hubs is 90 minutes. If an invalid stateID is presented, InterMix sends an "invalid or timed out" error message and returns an empty stateID.
Key to understanding this api is to realize that several structural components of InterMix are themselves InterMix items: hubs, users, categories, dimensions, threads, connections, connection offers, item annotations, item highlights and links, item templates and replies. InterMix items are kept in xml format in a repository.
Access to the items is via a keyword index file that is separately maintained by InterMix itself. That is, we do not depend on any index being handled by the repository. Instead InterMix itself handles the indexing through its own Berkeley DB keyword index file. (Nothing in this prevents the repository from also doing its own indexing for non-InterMix functions.)
ListFunctions
input
stateID
output from InterMix
formal description of the functions listed below
RequestRegistrationLoginAccess
input
stateID
output:
gifkey -- gif with random 4 characters to prove humanity -- called gifkey
stateID - tracks this login/registration until timeout or logoff
RegisterNewUser
required input:
stateID from prior registration access request
gifkey_text (the 4 characters in text format)
email_address
username
password
password again
optional input
personal_title / forename / middle_name or initial / surname
aka
general optional item input -- see appendix A
password clue
OK to track - default is FALSE
sex
birthdate (standard InterMix date format is yyyy-mm-dd, see app B for more on dates)
street_address1
street_address2
city
State or Province (standard codes by country are acceptable)
Country Code
home phone
work phone
cell phone
phonetree phone
hours best to contact
company
has-funny-bone-flag (default is TRUE)
builtin InterMix User dimensions:
education
nationality
nation of birth
ethnicity
language
religion
political affiliation
occupation
housing situation
sexuality (orientation)
family situation
health
capabilities
community
(extensible)
list of interests
list of categories to subscribe to
length of output lists (defaults to hub list length default)
output
return value
ok/not ok
invalid gifkey_text
invalid email address
invalid username (must be length >= 3, one space allowed for each 7 characters,
no special characters except underscore, hyphen,
no leading spaces)
username or email already taken
passwords do not match each other
invalid password (must be length >= 7, no whitespace, letters and numbers)
stateID
suggested username (if username already taken)
Login
required input
stateID from prior registration access request
gifkey_text
username or email_address or "anonymous" if hub allows anonymous
password (unless anonymous)
output
return_value
ok/not ok
invalid gifkey_text
login failed
stateID
new gifkey if return_value is not ok
RequestPasswordBeEmailed
required input
stateID from prior registration access request
gifkey_text
email_address
output
return_value
ok -- password will be emailed/not ok
invalid email
email not on file
stateID
RequestItemList
required input
stateID
optional input
max length of list default is from user then from hub -
InterMix ships with hub default 50
next/previous flag - default is next page
sort (moderator specified order, interest, approval, value, name, thread, date added --
default is moderator specified order)
query string
output
return_value
ok/not ok
invalid stateID
invalid keyword
invalid category
no hits
stateID
list of items, with name, MixID and list of parent item MixIDs if available
GetItemFormats
required input
stateID
Item MixID or ItemID (ItemID is sequentially assigned for a hub in base64, so is not huge
like MixID which is randomly assigned yet unique because it is so big -- InterMix
will use the Chandler UUID format and code.)
output
return_value
ok/not ok
invalid MixID or ItemID
item not found
stateID
list of available formats for this item
GetItem
required input
stateID
Item MixID or ItemID
format: native, XML, display HTML, update HTML, or other -- each format is
handled by an InterMix module
flags to control handling of InterMix apparatus that surrounds each item
* include minimal standard InterMix metadata, such as author, date posted
default is TRUE
* include maximal standard InterMix metadata, such as dublin core item type
default is TRUE
* include item core data (e.g. the text of a message)
default is TRUE
* include item rating statistics
default is TRUE
* include top keywords with counts to a limit of (integer, default 50, 0 for unlimited)
default is TRUE
* include list of categories w/ counts to a limit of (integer, default 50, 0 for unlimited)
default is TRUE
* include a list of annotations to a limit of (integer, default 50, 0 for unlimited)
default is FALSE
* include up to (integer, default 3) latest annotations embedded in the item text
default is FALSE
* include a list of hilights to a limit of (integer, default 50, 0 for unlimited)
default is FALSE
* intelligently hilight the item text on the basis of all item hilights
default is FALSE
* hilight the item text for (integer, default 3) latest hilights
default is FALSE
* show all user added links -- for heavily linked pages, this will be tricky -- the idea
is to show a page of links from any given linked character in the item, if that
character has more than one link associated. If only one item is linked from the
character, then of course we go directly there. Default is FALSE.
* show up to (integer, default 5) most highly rated user added links -- in this
scenario, it is not the link that is actually rated, but the item that is linked to.
default is TRUE
* show up to (integer, default 1) most recently added user links
default is FALSE
* show the rating of linked pages for all item links -- it is not clear how this is to
be done, but since this is the heart of the Ant Web concept, where we strengthen
links depending on the judgment of those who traversed those links previously, the
default is TRUE
* include a list of item links to a limit of (integer, default 50, 0 for unlimited) in
rating or date added order (default is rating order)
default is FALSE
output
return_value
ok/not ok
invalid MixID or ItemID
item not found
item format incompatible with flag
stateID
item in requested format as flagged
HilightItemText
required input
stateID
Item MixID or ItemID
selected item text
output
return_value
thank you/not ok
invalid MixID or ItemID
item not found
stateID
hilighted item
RateItem
required input
stateID
Item MixID or ItemID
one or both of the following
interest rating (no opinion, 0,1,2,3,4)
approval rating (no opinion, -1,0,1,2,3)
InterMix logical type
output
return_value
thank you/not ok
invalid MixID or ItemID
item not found
stateID
latest rating stats for item
AnnotateItem
required input
stateID
Item MixID or ItemID
item hilight with curly bracketed annotation embedded -- annotation is always
associated with a hilight
output
return_value
thank you/not ok
invalid MixID or ItemID
item not found
stateID
item with new hilight and annotation
AddItemKeywords
required input
stateID
Item MixID or ItemID
list of keywords
output
return_value
thank you/not ok
invalid MixID or ItemID
item not found
stateID
latest keyword list
RecategorizeItem
required input
stateID
Item MixID or ItemID
list of categories
output
return_value
thank you/not ok
invalid MixID or ItemID
item not found
stateID
latest category list for item
AddLinkInItem
required input
stateID
Item MixID or ItemID
selected item text with link indicated: prelink textlinked textpostlink text --
the link URI may be a web URL or a link to any item in the voh network -- the
format of voh URIs is not yet determined
output
return_value
thank you/not ok
invalid MixID or ItemID
item not found
selected text not found in item
stateID
item with new link (a link is an extension of a hilight)
AddItem
required input
stateID
new-item-xml -- at this point, we do not yet have an exact format
there will be some simple way to pass along
the info from a cgi-post
optional input
addressee list of users and groups
output
return_value
thank you/not ok
duplicate item
category(s) not found
stateID
new item MixID
new item ItemID
UpdateItem
required input
stateID
update-item-xml -- at this point, we do not yet have an exact format
again, there will be a simple way to pass along the input
from a cgi-post or cgi-put
output
return_value
thank you/not ok
stateID
new item MixID
new item ItemID
ReplyToItem -- this function may cause a thread to be created
same as add item, but must also specify item being replied to
More functions to be detailed:- create a group or category
- name co-moderators of a group
- join group
- add intra-hub connection
- make extra-hub connection offer
- accept extra-hub connection offer
- create dimensions
- add dimensions to categories
- publish/retract online availability
- find currently available hubs
- add templates
- name the hub host and co-hosts
----------------------------------
All dates are in format yyyy-mm-dd -- this format follows the Dublin Core spec. For instance, today is 2004-03-04. InterMix will make an effort to convert dates to the canonical format. see [link]
----------------------------------
An item has the following properties
InterMix ID
originating hub ID
list of originating category IDs
original poster userID and MixID
parent item ID for versioning
multi-part document item ID -- this describes the structure of a multi-part versioned doc
multi-part doc coherence tag list - all parts with the same tag form a coherent version
available format(s)
url
title
creator(s)
description
publisher(s)
creation or first published date
posting date
from address
to address
dc resource type (dc) - why have dc type? why not?
collection
dataset
event
interactive resource
service
software
sound
still image
moving image
text
physical object
dc pointer/review type -- only pointers/reviews have a value here
*InterMix logical type
thing in itself (dcmi type refers to the item)
pointer or review (has two dcmi types, one for pointer and one for item pointed to)
can be rated for itself or as a pointer - as a pointer it is rated for utility/disutility
and interest - dc collection is also a pointer as well as thing in itself
category - is conceptual object and also a dc collection - therefore it can be rated in
three ways, as a thing-in-itself-collection, as a pointer to the items collected, and
as a concept - as a concept it can be rated for interest only
InterMix operational type
web page
email
email list
newsgroup
image
calendar event
calendar ongoing event
chat log
blog
rss feed
InterMix XML template
user
category/perspective
dimension
league (groups that share a dimension)
mime type -- or does this apply to the parts of an item?
identifier (type,id) a tuple, where type is url, isbn etc
source(s) (dc)
*primary language (dc) rfc3066 and iso639
relation(s) - related items
coverage (dc) - place or jurisdiction and time period see [link]
rights (dc) - default is copyright to creator or original poster
list of dimensions with selected keywords for each dimension
optionally each dimension is an item with an address
optionally unselected keywords may be listed
(categories are external keywords)
non dimensional keyword list
item template ordered list
FormattedFlag
XML extension - indefinitely extensible by invoking programs, with rules that
allow for additional keywords from the xml data
item body
----------------------------------
user
all users belong to the builtin User category
the dimensions of the builtin User category are provided by the hub host
the dimensions of the builtin User category delineate possible perspectives for the hub
-----------------
hub
title is name of hub
InterMix ID is the hub ID
hub attributes:
availability (highest/high/low)
list of hub hosts
...
----------------------------------
category/perspective/group
a category is an item
when the category MixID is used as a "category" type keyword for an item,
then that item is "in" the category - i.e. it will show up on lists of items for that category
additional attributes of categories:
moderator list
list of dimensions the category belongs to
list of dimensions the category owns
short description
moderatedFlag
OnlyModCanReplyFlag
OnlyModCanWriteFlag
interest title
approval title
pointer interest title
pointer approval title
default sort
OK to export
CategoryIsPerspective
CategoryIsGroup
list of Perspective keywords
list of Category keywords
a perspective is a category that corresponds to user dimension keywords for instance a user may be an English speaker and therefore an English Speaker perspective can be formed. When a category is a perspective, then the name of the category must match a user dimension keyword for one of the dimensions available in the User category. Not every user dimension category need be instantiated as a perspective. Moreover a perspective may be formed by the intersection of more than one user dimension keyword.
When a perspective is added, every item posted by, created by or rated by a member of the perspective is added to the perspective.
a (membership) group is a category that has a membership -- that is one must join to
participate.
Perspectives are also a form of group. Groups are categories that have members.
Ratings are kept up to date for rapid recall for all categories regardless of whether they are groups or not.
Ratings can be assembled on request for any intersection of groups and categories. Also ratings can be assembled for any category from any perspective. This is a computer intensive task, so if it is going to be repeated often, a category should be set up at the intersection so the ratings are kept up to date as rating are made and new items posted.
Perspective groups and sub-groups can be written to only by members of the perspective. For a non-member of the perspective to write in reply to a perspective message, the non-member must go to another category and import the message before replying.
--------------------------
ratings file
each rating is kept with its own MixID, that of the rater and of the item
also read by
--------------------------
item/ratings/group file
keeps summary of ratings for each item for each local and linked hub group
visits
ratedby
Icount
Isum
Iavg
Acount
Asum
Aavg
Asumofsquares
controversy
value
same for each group and hub
-----------------------------
Instead of MySQL, InterMix will use Berkeley DB. This keeps us compatible with Chandler in licensing terms -- i.e. it should be possible for Chandler to include InterMix in its mainstream build as "part" of Chandler eventually without having the licensing problems that it would have with MySQL.
Technically, the big advantage of using Berkeley DB is that it can be embedded in InterMix instead of being a standalone program, like MySQL, which must be installed and maintained separately. MySQL is certainly very suitable for web server environments, but InterMix is for the end user, not for the technically savvy webmaster.
Moreover, Berkeley DB gives low level control to the programmer instead of requiring the use of SQL. And, Berkeley DB uses 48 bit addressing, which allows it to scale to 256 terabytes, and any one field can be up to 4 gigabytes -- that should hold us for a few years! Amazingly, Berkeley DB handles the big-endian / little-endian difficulty, so databases can be moved across operating systems easily. Berkeley DB runs on unix, linux, Windows and Mac osX. Here is the feature list. And here is an excellent article detailing the virtues of Berkeley DB as an embedded system, able to handle unexpected shutdown and never needing to be maintained by the user -- no compaction, no nothing.
It may make sense to use the Berkeley DB XML database also, as the builtin InterMix repository. It is not clear tho, what the advantages might be.
|
|
Category: Internet
7 comments
9 Mar 2004 @ 10:12 by Roger Eaton @209.55.71.130 : PUT, DELETE and state
Using PUT and DELETE is a characteristic of the {link:http://internet.conveyor.com/RESTwiki/moin.cgi/FrontPage|REST} style of web programming. But the very essence of REST is that communication is stateless -- i.e. the server does not keep track of the state of the client. Here is a quote from Roger Costello's {link:http://www.xfront.com/REST-Web-Services.html|Building Web Services the REST Way}: "Stateless: each request from client to server must contain all the information necessary to understand the request, and cannot take advantage of any stored context on the server."
The point of being stateless is that a layer of complication is removed from the process. Keeping state on the server, however, is efficient. Without it, one cannot have a session with a single secured login. List handling also suffers without state -- to get next page 20, the server would have to rebuild pages 1-19 internally instead of just using state information to proceed to the next page.
No sense pretending to be REST by using PUT and DELETE, therefore. InterMix has to have sessions, so it is not REST. On that basis it makes sense just to use POST for anything that updates the database.
If anyone has an idea how to recast the process so it is truly stateless, please add a comment!
10 Mar 2004 @ 09:21 by : Stateless
Well, there's of course that one maintains the state on the client side. Like how a browser keeps sending the login information for a password protected page at every single access. Or, if you need page 20 after looking at page 1-19, you'll need to provide bit more info to be sure you get what you expect, like what was the last record you saw on page 19. But of course none of that helps you verify on the server side that it is a human you are talking with, and the user wouldn't want to decode numbers of GIF files each time. So I guess you can't avoid sessions.
10 Mar 2004 @ 09:30 by : XML-RPC
No particular reason you should use XML-RPC rather than just POSTing and GETting and reading the results. XML-RPC doesn't do anything else either, of course. So I think the main advantage of XML-RPC is that a log of techies consider it simple, and it is well supported, so it is a no-brainer to make an XML-RPC call and grab the result. I think it is mainly that it creates the illusion that there's a shared standard that everybody can plug into. Which wouldn't be perceived quite the same way if you just do REST. Despite that, either way, you're still left with the task of actually parsing what you get back.
10 Mar 2004 @ 16:54 by Roger Eaton @204.250.12.246 : re: Stateless
Well, if we always use https, which is the only safe course in any case, I think we could allow the user continued access from the same IP address after just one gif-verification until no access for 90 minutes, something like that, with the user providing only login id and password each access. I still worry about getting https to work reliably. But say that is no problem, does REST give real benefits, or is it basically an illusion, too, like XML-RPC? Another question -- if we are relying on state, even if it is carried on the client side, isn't it just a dodge we are using and we are not being RESTful? One good thing about carrying state on the client is that it unburdens the server, and maybe it eases the programming, too. I wonder. If we just make it easier on the server, but harder on the client programming, then that discourages others from using InterMix as a back end. As it is now, the web scripter just needs to return the last sent stateID, which is easy enough.
20 Dec 2008 @ 22:55 by @219.116.149.150 : thanks
nice site. thanks.
20 Jan 2009 @ 03:49 by @58.242.200.148 : cool
which is the only safe course in any case, I think we could allow the user continued access from the same IP address after just one gif-verification until no access for 90 minutes, something like that
8 Jun 2009 @ 05:55 by @218.19.53.159 : pearl
Read to exercise the brain.
Surround yourself with friends.
Believe that people will like you for who you are.
Other entries in Internet
10 Jul 2010 @ 13:01: Strong Elastic Links
13 Oct 2008 @ 14:42: Call for Papers: (Online) Conference On Systemic Flaws and Solutions 2009
25 Oct 2007 @ 21:47: Static or dynamic web metaphors
28 Mar 2007 @ 05:36: The Tyee - Vancouver's Online Newspaper
11 Jul 2006 @ 15:12: Response to Josep L.I. Ortega's Statement for Unity of Action
25 May 2006 @ 10:14: Squidoo lenses
8 Apr 2006 @ 23:44: Web2.0
10 Jan 2006 @ 22:55: Agora and Antigora
14 Dec 2005 @ 15:15: Ruby on Rails
19 Nov 2005 @ 14:12: Saving the net from the pipe owners
|