INDEX
Explanations
instances of the letter "l" in the text
New Auto-Interp
Negative Logits
éĹĺ
-0.70
ãĥ¯ãĥ³
-0.69
chell
-0.66
ãĥ¼ãĥĨãĤ£
-0.64
lished
-0.64
OLOGY
-0.62
hower
-0.62
ãĥ¼ãĤ¯
-0.59
Credit
-0.59
Primal
-0.59
POSITIVE LOGITS
adders
1.39
ibrarian
1.33
ibr
1.27
ongh
1.17
onel
1.16
idd
1.15
izards
1.14
ivery
1.13
otto
1.13
ugs
1.12
Activations Density 0.011%