INDEX
Explanations
mentions of academic affiliations
references to "the" indicating locations or institutions
New Auto-Interp
Negative Logits
eat
-0.70
adesh
-0.67
coins
-0.66
aji
-0.66
whenever
-0.65
terness
-0.64
occurs
-0.64
NetMessage
-0.64
stals
-0.63
pler
-0.63
POSITIVE LOGITS
aforementioned
1.00
Institute
0.99
International
0.97
National
0.97
Department
0.92
University
0.91
Dominican
0.88
United
0.87
Philippines
0.85
latter
0.84
Activations Density 0.368%