Open Source Text Processing Project: SWING

Deep Learning Specialization on Coursera

SWING: An Open-Source Text Summarizer from WING

Project Website:

Github Link:


The Summarizer from the Web IR / NLP Group (WING), hence SWING, is a modular, state-of-the-art automatic extractive text summarization system. It produces informative summaries from multiple topic related documents using a supervised learning model. SWING is also the best performing summarizer at the international TAC 2011 competition, getting high marks on the ROUGE evaluation measure.

Two examples are provided below to illustrate usage of SWING:

1) Case 1: Training a model and generating summaries from the trained model

i) Use configuration.conf to set the path of training documents, topic description file, model summaries and training model file.
documents dir=/home/praveen/TAC/data/tac2010/docs/
xml file=/home/praveen/TAC/data/tac2010/test_topics.xml
model file=svm.2010.eval.model
model summaries dir=/home/praveen/TAC/evaluation/tac2010/ROUGE/models/

ii) Similarly, set the configurations for test environment. Be sure that you are using the same training model file in test environment too.
documents dir=/home/praveen/TAC/data/tac2011/data/source_documents/
xml file=/home/praveen/TAC/data/tac2011/docs/GuidedSumm_topics.xml
model file=svm.2010.eval.model

iii) Specify the features to be used for sentence scoring
features = sp, sl, dfs

iv) Navigate to $SWING_HOME/src. Run model_trainer and then summary_generator modules

2) Case 2: Generating summaries from an existing trained model

i) Set the configurations in testing environment and take care that the features specified are same as those used in model.
ii) run summary_generator

3) Case 2 is useful for evaluating a configuration of features with varying parameters such as redundancy threshold, summary length, scoring granularity etc.

4) Two models are provided in $SWING_HOME/data directory,
baseline.model = model for TAC 2010 cluster A documents with baseline features ( document frequency, sentence position, sentence length)
csi.sentence.model = model for TAC 2010 cluster A documents with baseline+CSI features ( document frequency, sentence position, sentence length, categoary relevance score, category KL divergence)

Leave a Reply

Your email address will not be published. Required fields are marked *