INDEX
Explanations
references to location or places
New Auto-Interp
Negative Logits
ively
-0.17
ute
-0.17
sp
-0.16
еÑĢж
-0.15
s
-0.15
amar
-0.15
urb
-0.15
èĹ¥
-0.14
spath
-0.14
aurant
-0.14
POSITIVE LOGITS
else
0.17
Ymd
0.15
ennen
0.15
orus
0.15
avel
0.15
dÄĽ
0.15
Äijây
0.14
Cousins
0.13
bij
0.13
osi
0.13
Activations Density 0.029%