INDEX
Explanations
comparative language and indicators of difference
New Auto-Interp
Negative Logits
I
-0.99
-0.95
you
-0.94
guys
-0.79
is
-0.76
/
-0.76
.
-0.75
-0.75
it
-0.74
we
-0.71
POSITIVE LOGITS
―――――
1.23
ſind
1.23
་་
1.23
itſelf
1.09
auffi
1.06
doubtnut
1.05
eſſ
1.05
ſelf
1.05
quæ
1.04
iſt
1.03
Activations Density 7.663%