INDEX
Explanations
exclamatory statements expressing surprise, frustration, or emphasis
expressions of frustration or emphasis using the word "hell"
New Auto-Interp
Negative Logits
Decre
-0.75
士
-0.72
States
-0.72
Short
-0.72
Random
-0.72
Population
-0.71
Recomm
-0.71
Vert
-0.70
Surveillance
-0.69
APD
-0.69
POSITIVE LOGITS
enic
1.05
ishly
1.02
hole
0.91
oise
0.89
holes
0.88
urous
0.85
fuck
0.82
dump
0.82
aciously
0.81
hound
0.81
Activations Density 0.009%