INDEX
Explanations
booleans, likely focusing on values 9 and 10
mentions of the National Rifle Association (NRA) and related terms
New Auto-Interp
Negative Logits
lace
-0.95
acity
-0.78
MacArthur
-0.73
tein
-0.72
Corpus
-0.69
ç¥ŀ
-0.69
bars
-0.67
ician
-0.66
iewicz
-0.65
terson
-0.65
POSITIVE LOGITS
RA
1.08
VE
1.04
VEN
0.97
BB
0.94
ISE
0.93
GE
0.93
BA
0.93
BO
0.92
BILITY
0.91
HAM
0.90
Activations Density 0.007%