INDEX
Explanations
mentions of political topics or terms
mentions of policy-related terms
New Auto-Interp
Negative Logits
é¾įåĸļ士
-0.88
SAY
-0.81
Parts
-0.80
ACTED
-0.80
Hidden
-0.77
CVE
-0.75
ADRA
-0.75
TEXTURE
-0.75
OULD
-0.72
Render
-0.72
POSITIVE LOGITS
pol
1.25
ipop
1.09
recip
0.85
ikarp
0.81
iton
0.80
atile
0.76
arity
0.76
atcher
0.73
igon
0.73
elbows
0.72
Activations Density 0.004%