INDEX
Explanations
references to familial or personal relationships
New Auto-Interp
Negative Logits
que
-0.15
ovit
-0.15
обо
-0.15
McG
-0.14
esel
-0.14
Both
-0.13
udoku
-0.13
ez
-0.13
Dank
-0.13
ãĢij,
-0.13
POSITIVE LOGITS
others
0.31
other
0.24
others
0.23
Others
0.22
Others
0.20
crew
0.19
his
0.18
åħ¶ä»ĸ
0.17
colleagues
0.17
autres
0.16
Activations Density 0.077%