INDEX
Explanations
expressions of admiration or positive descriptions
New Auto-Interp
Negative Logits
/Internal
-0.17
adesh
-0.17
izzle
-0.16
ilon
-0.16
Ñĥда
-0.15
alnız
-0.15
ekim
-0.15
suce
-0.15
LayoutPanel
-0.14
айд
-0.14
POSITIVE LOGITS
mind
0.32
simply
0.29
beyond
0.29
jaw
0.28
jaw
0.27
phen
0.26
breath
0.25
spell
0.25
unlike
0.25
Mind
0.25
Activations Density 0.242%