INDEX
Explanations
the word "too" and its variations indicating excessiveness
New Auto-Interp
Negative Logits
rael
-0.15
ry
-0.15
licht
-0.15
assez
-0.14
compat
-0.14
ávÄĽ
-0.14
phan
-0.14
happier
-0.14
rof
-0.14
äl
-0.14
POSITIVE LOGITS
much
0.34
soon
0.27
led
0.27
much
0.25
many
0.25
Much
0.25
Much
0.25
late
0.23
oooo
0.23
oooooooo
0.23
Activations Density 0.025%