INDEX
Explanations
references to political figures and their actions.
New Auto-Interp
Negative Logits
prest
-0.76
ILCS
-0.71
ordial
-0.64
Balt
-0.63
migr
-0.63
largeDownload
-0.63
cend
-0.61
Purg
-0.60
multipl
-0.60
pilgrimage
-0.59
POSITIVE LOGITS
inaction
1.00
unfairly
0.86
hypocrisy
0.80
harshly
0.78
leaders
0.76
failing
0.75
coward
0.73
shortcomings
0.73
detractors
0.73
critics
0.73
Activations Density 0.226%