INDEX
Explanations
instances of words that reveal political or social issues
New Auto-Interp
Negative Logits
surpr
-0.53
CLASSIFIED
-0.50
NOW
-0.49
tal
-0.49
llor
-0.48
Voice
-0.48
roth
-0.47
rolet
-0.46
emo
-0.46
henko
-0.46
POSITIVE LOGITS
lieu
1.15
accordance
1.07
favor
0.95
conjunction
0.94
vitro
0.88
order
0.88
favour
0.87
regards
0.87
efficiency
0.87
ordinate
0.86
Activations Density 8.165%