INDEX
Explanations
references to governmental and institutional authority figures
New Auto-Interp
Negative Logits
estar
-0.16
mazon
-0.15
writ
-0.14
amil
-0.14
intermitt
-0.14
imageName
-0.14
herits
-0.13
Reply
-0.13
decrement
-0.13
uably
-0.13
POSITIVE LOGITS
imp
0.26
called
0.23
pledged
0.21
charged
0.20
dep
0.20
asked
0.19
invited
0.19
exh
0.19
ask
0.18
appealed
0.18
Activations Density 0.073%