i need to extract non-boring words from an
arbitrary string of text. any ideas? this would seem like something
that
morbus
,
aaron
or
les
might be into. basically i want to create a topical list of words
and phrases from arbitrary text. i know the cia has been working on
this for years, but there has to be a poor man’s version of this
somewhere.
i’d like to take text from a chat room
application and construct a custom list of topical keywords based
on the chat histories. these will get fed into jwz’s
webcollage
which will create a composite of images fetched based on
conversational topics. i’ve got most of it running right now, but
there’s a load of noise based on mundane words like “if”, “and”,
“but” etc.
right about now, i’m wishing i comments enabled. i know it’s a hard
problem but any ideas are welcome.
of course, i’ll release the code once i make some basic design decisions.