INDEX
Explanations
instances of demonstrative and definite articles, indicating a focus on specific concepts or entities
New Auto-Interp
Negative Logits
Äį
-0.14
yn
-0.14
mne
-0.14
ule
-0.13
_PRIV
-0.13
note
-0.13
ens
-0.13
jin
-0.13
uer
-0.13
sacrifice
-0.13
POSITIVE LOGITS
ulumi
0.16
icious
0.16
ìĽĥ
0.16
mohla
0.14
arsi
0.14
ưỡng
0.14
icari
0.14
ì¶ķ
0.14
konkrét
0.14
fois
0.14
Activations Density 0.277%