INDEX
Explanations
references to the word "us" in various contexts
New Auto-Interp
Negative Logits
och
-0.16
ous
-0.15
endon
-0.15
quist
-0.15
himself
-0.15
arah
-0.15
lein
-0.14
cala
-0.14
uchi
-0.14
hattan
-0.14
POSITIVE LOGITS
/me
0.23
/us
0.22
urious
0.20
eping
0.18
ury
0.17
SEL
0.17
tesy
0.16
opia
0.16
$č↵
0.16
’re
0.15
Activations Density 0.061%