INDEX
Explanations
phrases indicating a desire for attention and validation
New Auto-Interp
Negative Logits
lder
-0.16
.scalablytyped
-0.15
kee
-0.15
ais
-0.15
byn
-0.15
vens
-0.15
pivot
-0.14
å¨
-0.14
Alta
-0.14
é«
-0.14
POSITIVE LOGITS
azzi
0.15
liter
0.15
anybody
0.14
918
0.14
ekl
0.14
enger
0.14
oji
0.14
į¨
0.14
.schedule
0.14
İ·
0.14
Activations Density 0.130%