Feel free to send comments
and update requests to :firstname.lastname@example.org
ROWS --> METHODOLOGIES
COLUMNS --> TOPICS
Each article in the corpus is represented as a weighted sum of combinations of topic + methodology.
This means that each article adds a total weight of 1.0 to the matrix, split among one or more cells.
This weights are then added over the entire corpus and the intensity of the color represents the logarithm of the total sum of weights for a particular cell.
On the right we can find the top 10 articles with heavier weight for the selected cell.
Articles are classified into independently into coherent groups of subjects and coherent groups of
methodologies. Each of these groups gathers similar words together into a topic with an unsupervised learning algorithm known as Latent Dirichlet Allocation (LDA). The groups are then represented by significant words (in our case subjects or methodologies) which can be seen in the LDA visual representation below.
The heatmap shows the joint distribution of the articles in the subject-methodology space.
Based on: LDA Article
Adjust the slider to shift between te absolute and relative relevance of the words within the topic. If the slider is closer to 1.0 then more importance is given to the most commons word in the group, if the slider is closer to 0.0 then the importance is given to the words that appear most exclusively in that group.