INDEX
Explanations
references to direction or movement toward a goal or endpoint
New Auto-Interp
Negative Logits
Süß
-0.42
enfans
-0.41
saja
-0.40
deleteItem
-0.36
nô
-0.36
ektiv
-0.36
Canaria
-0.34
الدراسه
-0.34
않
-0.33
fools
-0.33
POSITIVE LOGITS
towards
1.91
toward
1.88
Toward
1.84
towards
1.81
Towards
1.80
toward
1.78
Towards
1.73
Toward
1.72
hacia
1.29
verso
1.17
Activations Density 0.137%