INDEX
Explanations
occurrences of official titles and roles related to authority or government
New Auto-Interp
Negative Logits
aken
-0.15
eds
-0.15
uncompressed
-0.14
arga
-0.14
ouch
-0.13
etto
-0.13
ervlet
-0.13
ubb
-0.13
arg
-0.13
Dummy
-0.13
POSITIVE LOGITS
declined
0.43
declines
0.35
refused
0.33
decline
0.32
wouldn
0.31
declining
0.29
decl
0.25
neither
0.25
deferred
0.24
couldn
0.24
Activations Density 0.048%