INDEX
Explanations
phrases indicating changes or amendments in policies or programs
New Auto-Interp
Negative Logits
rop
-0.16
ilet
-0.14
íĸ¥
-0.14
new
-0.13
çͲ
-0.13
ugin
-0.13
ument
-0.13
_extensions
-0.13
ÏĦεÏģ
-0.13
issing
-0.12
POSITIVE LOGITS
ếp
0.17
existing
0.17
existing
0.17
ascus
0.15
estruct
0.15
abez
0.15
ritos
0.15
าหล
0.14
acceptable
0.14
originally
0.14
Activations Density 0.145%