INDEX
Explanations
phrases that describe locations or positions relative to other objects
New Auto-Interp
Negative Logits
Anſ
-0.68
✭✭
-0.68
houſe
-0.68
Efq
-0.66
ſelf
-0.65
Diſ
-0.61
Reſ
-0.61
jadx
-0.58
DZ
-0.58
ſelves
-0.57
POSITIVE LOGITS
Beneath
0.72
Near
0.70
neath
0.70
near
0.70
Beneath
0.68
devant
0.68
Near
0.66
возле
0.66
LabelTagHelper
0.62
derrière
0.61
Activations Density 0.228%