ronly

IR & NLP Development Resources

Books

  • Practical Artificial Intelligence Programming in Java [ e-book ]

Text Retrieval Toolkits

Resources in abstractive summarization

While much work has been done in the area of extractive summarization, there has been limited study in abstractive summarization as this is much harder to achieve (going by the definition of true abstraction). Existing work in abstractive summarization may not be truly abstractive, and even if it is, it may not be fully automated. This page contains a collection of interesting summarization methods that are non-extractive. 

Opinosis Opinion & Text Summarization Library

The Opinosis Summarizer Software is a text summarizer that generates concise abstractive summaries of highly redundant text. It  was primarily used to summarize opinions, and thus it can  be regarded as a opinion summarization software.

Sample Summaries: Humans, Opinosis, MEAD

These are some sample results from the Opinosis Summarization test set. The goal of Opinosis [1] is to generate short abstractive summaries that can summarize the main idea while maintaining readability. If you want to generate your own summaries,  you can download the demo version of the Opinosis Summarizer.

Note: In the table below, you will see multiple human composed summaries for one topic. This is actually composed by different people.


References

prepare4rouge - Script to Prepare for Rouge Evaluation

This is a perl script that takes in all your system generated files, then all your gold standard/reference summary files, and prepares it in the format used by the ROUGE evaluation toolkit. In other words, it prepares the following from the input that you provide:

  • models/
  • systems/
  • settings.xml

There are 2 versions of the script, one is output using jackknifing and the other is the usual evaluation without the need for jackknifing.

Opinosis Dataset - Topic related review sentences

Dataset Type: Text
Format: Topic oriented opinion sentences for 51 different topics
Domain: hotels, cars, products
Source: Amazon.com, TripAdvisor.com, Edmunds.com

Wrapper over Stanford's POS Tagger

This is a Java based wrapper over Stanford's NLP POS Tagger (English only). It reads the contents of the user specified input file (line by line) and prints out the parsed text in the following format: "that/DT has/VBZ never/RB happened/VBN before/RB ./.". More instructions in the readme. Example usage: java -Xmx1G -Xms1G -jar Postag1.0.jar "<PATH TO INPUT FILE>" > parsed_output.txt

rouge2csv - Script to Interpret ROUGE Scores

This is a perl script that helps in interpreting ROUGE scores. If you need Instructions on how to set-up ROUGE for evaluation of your summarization tasks go here.

OpinRank Dataset - Reviews from TripAdvisor and Edmunds

Dataset Type: Text
Format: Full reviews from Tripadvisor and Edmunds
Domain: hotels, cars

Dataset Overview

This data set contains full reviews for cars and and hotels collected from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews).

Patents

  • Methods and Systems for Activity Based Recommendations
    Kavita Ganesan, Harshal Deo, Monica Dhanaraj
    Patent Application No US 20100076857, Filed Sept. 25, 2008
     
  • Search Clustering
    Kavita Ganesan, Neel Sunderesan, Roopnath Grandhi
    Patent No US2008120292. Filed Jun 29, 2007.
  • Visualization of Reputation Ratings
    Kavita Ganesan, Neel Sunderesan, Harshal Deo
    Patent No US20080256040. Filed Apr. 16, 2007.
     

Text Mining, IR and NLP References

These are some Text Mining, IR and NLP related reference materials that I find useful.

Syndicate content