INDEX
Explanations
words related to family members or connections
references to family members and relatives
New Auto-Interp
Negative Logits
Cola
-0.80
lite
-0.78
Effective
-0.74
inem
-0.66
yss
-0.66
okin
-0.65
Indust
-0.65
oker
-0.65
atan
-0.65
lins
-0.64
POSITIVE LOGITS
relatives
1.02
hips
0.96
hetical
0.83
hood
0.81
reunion
0.80
lia
0.78
ships
0.75
ilial
0.75
abroad
0.75
reun
0.74
Activations Density 0.082%