INDEX
Explanations
pronouns and their variations
New Auto-Interp
Negative Logits
лок
-0.15
Hao
-0.15
ten
-0.14
Rose
-0.14
Matte
-0.14
unit
-0.14
isse
-0.14
ech
-0.14
Reuse
-0.14
çļ
-0.14
POSITIVE LOGITS
uder
0.16
_consts
0.15
rive
0.15
ocu
0.14
_vert
0.14
nika
0.14
Dear
0.14
olursa
0.14
lamaz
0.14
uerdo
0.14
Activations Density 0.021%