INDEX
Explanations
verbs indicating movement or direction
New Auto-Interp
Negative Logits
ayrıca
0.42
Accessory
0.41
)$.
0.40
সিনে
0.38
並且
0.38
vendar
0.38
}$).
0.37
ospin
0.37
asociada
0.37
þei
0.36
POSITIVE LOGITS
into
0.77
onto
0.61
into
0.59
Into
0.58
Into
0.56
up
0.48
towards
0.47
новую
0.45
увагу
0.45
vào
0.44
Activations Density 0.028%