INDEX
Explanations
phrases that express a sense of pride
New Auto-Interp
Negative Logits
iaz
-0.74
ura
-0.74
soDeliveryDate
-0.73
igmatic
-0.69
eller
-0.67
MX
-0.65
soever
-0.65
guiActiveUn
-0.65
opus
-0.64
accounted
-0.64
POSITIVE LOGITS
what
0.92
how
0.86
sorts
0.81
having
0.81
this
0.77
them
0.77
him
0.76
course
0.75
ourselves
0.75
everything
0.74
Activations Density 0.034%