INDEX
Explanations
phrases related to social and political issues
New Auto-Interp
Negative Logits
ILCS
-0.70
TEXTURE
-0.66
Flavoring
-0.63
ModLoader
-0.63
Ire
-0.62
Lerner
-0.60
ULL
-0.56
Natural
-0.56
Snap
-0.55
LET
-0.54
POSITIVE LOGITS
..."
0.97
"}
0.96
!".
0.95
"?
0.91
").
0.91
?".
0.91
"/>
0.89
equals
0.89
.")
0.88
sucks
0.86
Activations Density 0.187%