INDEX
Explanations
references to friendships and relationships
New Auto-Interp
Negative Logits
iversit
-0.16
162
-0.15
elu
-0.15
ünchen
-0.15
iddles
-0.15
indsight
-0.14
atonin
-0.14
weep
-0.14
undry
-0.14
jected
-0.14
POSITIVE LOGITS
forged
0.27
severed
0.25
formed
0.24
solid
0.23
cement
0.23
established
0.23
建ç«ĭ
0.22
based
0.22
built
0.21
Established
0.21
Activations Density 0.152%