Most ECM projects revolve around managing content and use it in business processes. We are very familiar with the benefits of managing content, preserving it, and using it in business transactions. The majority of ECM investments are in ensuring that content is not lost, it is available to the relevant people at the right time, and is presented to users to complete a task in hand. Documents are searched for once their active use in a business transaction is over as well.
The underlying assumption has always been that the content needs to be presented to a user when she needs it. So we attach necessary and sufficient metadata to our content or even make it content-searchable. 90% of the ECM users stop right there in their utilization of the content they have at their disposal.
The digital world is witnessing an analytics wave now. Trends and insights are the most commonly sought after buzzwords in the industry now. Big and small data are being analyzed left and right to search for the grains of wisdom that could ultimately provide that elusive competitive advantage. One interesting observation from an AIIM research shows that it is far easier to gain insights from publicly available data than from an organization?s internal resources. It is quite a truth that the first place anybody would look for a piece of information is Google and not an internal ECM repository. But there lies a huge difference when it comes to looking for insights. The information that we keep in our internal repositories are far more relevant for our organization than what Google can provide. So there has to be an effort to utilize the content that we store.
This is precisely where Content Analytics comes in. Even though any content analytics tools and technologies available today can no way match the human brains ability to decipher information, they provide a good start. Additionally such tools can process vast amounts of content, extract information, and to an extent apply semantic deciphering. Most of the tools are self-learners where the analytics improve as time goes. Content Analytics works a lot better with semi-structured information such as Twitter feeds, Facebook comments etc. in comparison with unstructured long-form content.
At a high level Content Analytics go through four major steps: bringing in content, extracting information, analysis, and generating output.
To bring-in content to an analytics tool one can employ crawlers or import mechanisms. Crawlers are common among the commercial tools available. Crawlers let the analytics platform to look for new information in specified sources and bring content in as and when it is available. Crawlers can work with internal content stores including ECM repositories or shared drives or even the Internet. Most tools provide options to push content to the platform either manually or automatically as well.
The next step is to extract information from content. The content could come in many forms: from text data to office documents to images to audio or video. Information needs to be extracted in text form to feed into an analytics module and this step employs filters to extract text data from the input content. A wide range of tools and technologies are available that helps in extracting information from varied source formats.
The analysis step is the most crucial part of content analytics. This is where the software reads and analyzes the inputs by performing text analytics algorithms. Text analytics involves sentence detection, tokenization, parts of speech recognition, classification or annotation, entity and relationship identification etc.
Once text analysis is completed, the tools let the analyzed and extracted information to be formatted according to downstream processing needs and then export to the relevant systems.
Content analytics will be more and more prominent as time goes by and the technology will evolve to a much higher acceptance level. Even though there are many commercial software available, much of the research is pioneered in the open source domain. In my opinion, content analytics is one of the technologies that we as ECM professionals can quickly subscribe to and provide considerable value to our customers.