INDEX
Explanations
references to measuring or quantifying items or actions
New Auto-Interp
Negative Logits
de
-0.47
-0.47
ra
-0.46
no
-0.46
to
-0.46
lo
-0.43
na
-0.43
or
-0.42
(
-0.42
.
-0.41
POSITIVE LOGITS
للمعارف
1.20
vez
1.01
eens
1.00
__":
0.95
gången
0.94
time
0.94
"):
0.94
للاسماء
0.92
gangen
0.90
fois
0.88
Activations Density 0.093%