INDEX
Explanations
words related to statistical comparisons and ratios
New Auto-Interp
Negative Logits
None
-0.55
-0.54
none
-0.53
autres
-0.53
None
-0.50
inga
-0.50
neither
-0.50
these
-0.49
Any
-0.47
none
-0.47
POSITIVE LOGITS
everyone
1.22
everybody
1.15
everything
1.08
Everybody
1.07
everybody
1.03
EVERY
1.03
Everybody
1.02
everyone
1.00
Everyone
0.98
Everyone
0.98
Activations Density 0.202%