INDEX
Explanations
criticism of political figures and institutions
New Auto-Interp
Negative Logits
gow
-0.81
zac
-0.76
essee
-0.69
yip
-0.69
ovember
-0.67
ividual
-0.65
seys
-0.65
iewicz
-0.63
jri
-0.62
matter
-0.62
POSITIVE LOGITS
collapsing
0.64
thumbnails
0.62
NVIDIA
0.58
Asset
0.57
Running
0.57
Rossi
0.56
Heart
0.56
Kend
0.55
ooters
0.55
intertwined
0.55
Activations Density 0.006%