INDEX
Explanations
the presence of specific patterns in names or titles, particularly those that include the syllable "ek."
New Auto-Interp
Negative Logits
_MACRO
-0.17
uld
-0.16
suit
-0.15
ipple
-0.15
ôt
-0.14
ught
-0.14
fuck
-0.14
shake
-0.14
zast
-0.14
åĽ
-0.13
POSITIVE LOGITS
edis
0.20
unds
0.16
oppel
0.16
ora
0.15
ernels
0.14
thumbs
0.14
еÑĢин
0.14
edy
0.14
iel
0.14
ker
0.14
Activations Density 0.045%