INDEX
Explanations
references to individuals and collective groups within various contexts
New Auto-Interp
Negative Logits
ing
-0.94
ة
-0.80
</sub>
-0.79
Paz
-0.78
Paz
-0.78
Lawson
-0.77
slidesPer
-0.75
います
-0.72
+}$
-0.70
.
-0.70
POSITIVE LOGITS
theless
1.34
Opus
1.07
Everybody
1.06
Everybody
1.05
doughnuts
1.04
everybody
0.99
Cæsar
0.97
Anybody
0.97
soever
0.97
Ayres
0.96
Activations Density 0.165%