INDEX
Explanations
phrases that indicate instructional or procedural content
New Auto-Interp
Negative Logits
bris
-0.16
кав
-0.15
ulus
-0.15
ãĥŃãĥ¼
-0.14
кад
-0.14
enus
-0.14
arsi
-0.14
جا
-0.13
gens
-0.13
uat
-0.13
POSITIVE LOGITS
oda
0.16
further
0.16
915
0.15
897
0.14
weiter
0.14
äft
0.14
istrovstvÃŃ
0.14
Friedman
0.14
Moran
0.13
ëł
0.13
Activations Density 0.101%