INDEX
Explanations
the word 'and' or conjunctions
New Auto-Interp
Negative Logits
ourselves
-0.67
myself
-0.46
myself
-0.44
раздо
-0.44
likopter
-0.44
емся
-0.43
IsRequired
-0.43
нами
-0.42
ıyoruz
-0.42
próprias
-0.40
POSITIVE LOGITS
he
1.23
she
1.15
they
1.15
он
1.02
она
1.02
они
0.95
він
0.94
वह
0.93
вона
0.91
вони
0.88
Activations Density 3.663%