INDEX
Explanations
phrases related to real-life situations or facts
references to social, economic, and political realities
New Auto-Interp
Negative Logits
fight
-0.70
nesty
-0.69
BUG
-0.65
Harris
-0.65
rip
-0.65
Naz
-0.65
ded
-0.62
lier
-0.62
idine
-0.62
raid
-0.62
POSITIVE LOGITS
uggest
1.07
etter
1.07
atisf
1.01
omething
0.99
hops
0.99
cape
0.96
hips
0.95
ettings
0.94
poons
0.91
ongs
0.90
Activations Density 0.060%