INDEX
Explanations
instances of the letter "d" in various cases (upper and lower)
New Auto-Interp
Negative Logits
andro
-0.15
oleon
-0.15
ITCH
-0.15
INGER
-0.15
iero
-0.15
INVAL
-0.15
alon
-0.14
dro
-0.14
zek
-0.14
abwe
-0.14
POSITIVE LOGITS
istinguished
0.33
istingu
0.29
ipl
0.28
rama
0.25
iversity
0.25
etermination
0.25
ign
0.23
istinguish
0.23
eline
0.23
etailed
0.23
Activations Density 0.038%