INDEX
Explanations
statements that express strong opinions or judgments
New Auto-Interp
Negative Logits
essentially
-0.58
vooz
-0.54
Essentially
-0.54
<unused20>
-0.53
<unused41>
-0.53
<unused23>
-0.52
<unused43>
-0.52
<unused16>
-0.52
<pad>
-0.52
<unused8>
-0.52
POSITIVE LOGITS
ModelExpression
0.62
hipó
0.42
swear
0.35
AndEndTag
0.35
typeorm
0.35
kain
0.34
espiritual
0.32
tiger
0.32
worse
0.31
cuad
0.31
Activations Density 0.147%