INDEX
Explanations
expressions involving swearing and emphasis
New Auto-Interp
Negative Logits
NetMessage
-0.94
KY
-0.76
Cosponsors
-0.74
anwhile
-0.68
CRE
-0.68
ynthesis
-0.67
KEN
-0.65
OHN
-0.64
IDS
-0.64
chn
-0.64
POSITIVE LOGITS
ibly
0.89
damned
0.84
darn
0.83
ation
0.83
damn
0.82
near
0.81
orse
0.79
atio
0.78
selves
0.76
warm
0.74
Activations Density 0.024%