INDEX
Explanations
references to familial and interpersonal relationships
New Auto-Interp
Negative Logits
antis
-0.19
ipop
-0.15
imb
-0.15
ampie
-0.14
ilan
-0.14
znik
-0.14
олÑĮ
-0.14
çļĦæĥħ
-0.13
ylko
-0.13
ãģ®åŃIJ
-0.13
POSITIVE LOGITS
his
0.20
his
0.19
/actions
0.17
Kanun
0.15
Ulus
0.15
enan
0.15
283
0.14
whom
0.14
ä»ĸçļĦ
0.14
"+"
0.14
Activations Density 0.380%