INDEX
Explanations
references to the Sedin twins
New Auto-Interp
Negative Logits
portun
-0.15
ourn
-0.14
HÃłng
-0.14
iyan
-0.14
stown
-0.14
pin
-0.14
pine
-0.14
tere
-0.14
azi
-0.13
.refs
-0.13
POSITIVE LOGITS
alia
0.24
uction
0.23
uctive
0.23
sed
0.22
uced
0.21
Sed
0.21
tember
0.19
uces
0.18
ition
0.18
angkan
0.18
Activations Density 0.005%