Information Overload
by Daniel
Hopping
Author's Home Page
I remember when large corporations did not have as much storage capability as the average teenager now has on their mp3 player. Memory chips for cameras are now at eight gigabytes. I’m obsolete and my phone, my PDA, and my camera all have a gigabyte of storage each, and I can swap those modules into my laptop like we used to do with floppy disks. Google has processed over eight billion web pages. The growth of the web is not slowing down anytime soon. Blogs are growing by 20,000 a day and wikis are starting to enjoy the same growth pattern. (If you want to understand what a wiki is – go to www.wikipedia.org and create new content on a topic that you are knowledgeable about. Wikipedia is an on-line, free, open encyclopedia that anyone can edit.)
Unstructured information
Unstructured information is doubling about every eight months. Do you remember the old high school math exercise of calculating out giving a person a penny on January first and doubling it each day? In six weeks the person would have $43,980,465,111.03. When you double something fairly often, the number gets really big, really quick. Within the next five years the management of unorganized information can become a nightmare for retailers. Keeping the data clean, private and secure will be a major challenge. Getting any use out of it will be an even bigger challenge.
Unstructured data or unstructured information refers to stuff in the computer which does not have a data structure and whose content can’t be understood by normal applications. This includes e-mail, memos, word processing documents, audio, video, presentations, white papers, reports and most web pages.
According to Nelson Mattos, Vice President of Strategy in IBM WebSphere Information Integration Solutions, “Today, almost 85 percent of corporate data is unstructured information.”
That 85% of the content of a retailer’s storage is not really usable in the day to day challenge of making them more competitive.
There are several things being done to help retailers cope with this explosion of data. One is the new discipline called Unstructured Information Management (UIM). After four years of development, IBM Research has announced the Unstructured Information Architecture (UMIA) along with a Software Development Kit (UMIA SDK) for use in creating applications for knowledge discovery and business systems to organize and deliver knowledge to users such as buyers and merchants.
In analyzing unstructured information, UIM applications use a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontologies.
Since it is a technology designed to support a new breed of software applications that can process text within documents and other content sources to understand the latent meaning, relationship and relevant facts buried within the unstructured content in Retail. This can dramatically increase the effectiveness of CRM applications. It can allow retailers to analyze the content of Blogs that contain the name of their store. Learn the buzz on your stores in real-time. Understand the trend data in your transaction logs.
IBM recently announced plans to make the UMIA available through open source. The technology will be presented to the Open Source Technology Group with availability through SourceForge expected by the end of year 2005.
UIMA provides an open software framework with standard interfaces for adding unstructured information analytics to any application. This framework makes it easy to integrate the analytic software tools and end-to-end enterprise applications across several different vendors. UIMA also provides tools to speed the creation of new, reusable analytic software components to handle unstructured information.
The UIMA SDK (Software Development Kit), is an all-JavaTM implementation of the UIMA framework, and it supports the implementation, description, composition, and deployment of UIMA components and applications.
The UIMA framework can currently be downloaded free of charge from IBM alphaWorks. At this you will find more information on this "early adopter" version of the SDK. The alphaWorks SDK is also a test bed to gather feedback on new features of the UIMA SDK. Its versions may evolve more rapidly, and are not tied to specific OmniFind releases. The SDK is supported on a "best can do" basis, via the alphaWorks forum. It is available in Linux or Intel versions.
The UIMA framework has already been embedded in IBM products, including IBM WebSphere Information Integrator OmniFind Edition, the first commercially available software platform for processing content based on the UIMA standard. IBM WebSphere Portal Server and Lotus Work Place also leverage UIMA for content processing.
Copyright © Daniel Hopping.
Article Source: http://www.article-host.com/
|