INDEX
Explanations
references to location-based contexts or indications of presence
New Auto-Interp
Negative Logits
bezeichneter
-0.68
perchè
-0.64
Hadrian
-0.60
principalTable
-0.57
Sigism
-0.57
持ち
-0.55
Phry
-0.54
Muske
-0.54
нему
-0.53
bogor
-0.52
POSITIVE LOGITS
وفي
0.88
в
0.87
midst
0.86
IN
0.82
ใน
0.81
Dalam
0.81
dalam
0.80
וב
0.79
isIn
0.79
in
0.78
Activations Density 0.011%