INDEX
Explanations
mentions of government officials and bureaucratic terms
New Auto-Interp
Negative Logits
Latter
-0.75
Stream
-0.71
natives
-0.67
Condition
-0.67
Contents
-0.64
Strip
-0.64
Nation
-0.63
avery
-0.63
Artists
-0.63
Methods
-0.62
POSITIVE LOGITS
esses
1.02
ially
1.00
overseeing
0.93
alty
0.91
hips
0.91
tasked
0.90
ial
0.90
iate
0.88
ess
0.86
ials
0.86
Activations Density 0.959%