INDEX
Explanations
references to systems, orders, and general conditions in discussions
Follows discourse markers or punctuation
first person explanation
New Auto-Interp
Negative Logits
my
-1.13
minha
-1.01
我的
-0.98
my
-0.97
mijn
-0.96
meu
-0.95
Mijn
-0.94
minhas
-0.92
meine
-0.90
meus
-0.89
POSITIVE LOGITS
we
1.60
I
1.50
We
1.11
we
1.09
We
1.06
я
0.95
мы
0.84
I
0.77
WE
0.76
我就
0.72
Activations Density 0.812%