INDEX
Explanations
references to locations or the concept of being somewhere
New Auto-Interp
Negative Logits
-------
-0.71
__*/
-0.62
évaluateur
-0.60
-0.60
__':
-0.60
nonUne
-0.60
AssemblyTitle
-0.59
propOrder
-0.58
الإنجليزية
-0.57
Билгалдахарш
-0.57
POSITIVE LOGITS
else
0.56
near
0.50
near
0.40
around
0.38
along
0.35
isOk
0.34
between
0.34
orizz
0.33
nær
0.32
abroad
0.32
Activations Density 0.119%