INDEX
Explanations
special characters used to emphasize or express strong emotions
repeated symbols or characters within the text
New Auto-Interp
Negative Logits
fishes
-0.76
welf
-0.74
greens
-0.71
ãĥ¼ãĥĨãĤ£
-0.70
indemn
-0.68
pped
-0.67
adulthood
-0.65
charger
-0.65
nodd
-0.65
laun
-0.65
POSITIVE LOGITS
they
1.01
WHERE
1.00
yet
0.99
wait
0.96
BUT
0.94
etc
0.91
there
0.90
why
0.88
where
0.87
yeah
0.86
Activations Density 0.039%