INDEX
Explanations
directional language related to movement and position
New Auto-Interp
Negative Logits
Filler
-0.58
торая
-0.53
</>
-0.51
he
-0.51
she
-0.50
witzerland
-0.48
ANCO
-0.48
Reed
-0.48
tellte
-0.48
présence
-0.48
POSITIVE LOGITS
towards
0.83
towards
0.82
toward
0.80
ویکیپدیای
0.75
ParallelGroup
0.75
toward
0.72
westward
0.72
ConstraintMaker
0.69
outwards
0.69
rungsseite
0.68
Activations Density 0.235%