INDEX
Explanations
elements related to relationships and familial structures
New Auto-Interp
Negative Logits
apos
-0.16
ersh
-0.15
gom
-0.15
gren
-0.14
orpor
-0.14
enez
-0.14
леÑĩ
-0.14
iskey
-0.14
lus
-0.14
crib
-0.14
POSITIVE LOGITS
what
0.17
studio
0.17
whose
0.14
ovel
0.14
ten
0.14
rect
0.14
kk
0.14
when
0.14
c
0.14
b
0.13
Activations Density 0.394%