INDEX
Explanations
instances of the pronoun "it"
New Auto-Interp
Negative Logits
Ùĩ
-0.16
ufs
-0.16
lep
-0.15
edom
-0.15
ilde
-0.14
riers
-0.14
istik
-0.13
воно
-0.13
velle
-0.13
iefs
-0.13
POSITIVE LOGITS
iner
0.32
chy
0.26
onto
0.26
clear
0.25
/th
0.21
possible
0.20
ches
0.18
unes
0.18
abund
0.18
easier
0.17
Activations Density 0.026%