INDEX
Explanations
expressions related to familial and social connections
New Auto-Interp
Negative Logits
itizen
-0.15
lio
-0.14
abilit
-0.14
okies
-0.14
ubic
-0.14
лÑıд
-0.13
баÑĤÑĮкÑĸв
-0.13
ibel
-0.13
oretical
-0.13
thern
-0.13
POSITIVE LOGITS
ranks
0.18
ivery
0.16
st
0.15
things
0.15
Ùħباش
0.15
Dud
0.15
other
0.14
themselves
0.14
swick
0.14
akan
0.14
Activations Density 0.058%