INDEX
Explanations
references to geographical locations and directions
New Auto-Interp
Negative Logits
higher
-0.18
higher
-0.17
overhead
-0.16
alo
-0.16
hd
-0.16
oben
-0.15
ichert
-0.15
éłĤ
-0.15
é¡¶
-0.15
виÑģок
-0.15
POSITIVE LOGITS
bottom
0.35
below
0.34
below
0.32
bottom
0.30
-bottom
0.29
Bottom
0.28
Bottom
0.28
BOTTOM
0.27
ä¸ĭ
0.27
Below
0.27
Activations Density 0.134%