INDEX
Explanations
instances of the word "char" in various contexts
New Auto-Interp
Negative Logits
y
-0.18
yn
-0.18
yny
-0.17
el
-0.16
eh
-0.16
etas
-0.16
amiento
-0.16
ea
-0.15
grad
-0.15
ese
-0.15
POSITIVE LOGITS
ismatic
0.28
coal
0.26
akter
0.26
isma
0.24
itably
0.24
tered
0.23
itable
0.23
acters
0.20
leston
0.19
lene
0.19
Activations Density 0.014%