INDEX
Explanations
verbs followed by common words
New Auto-Interp
Negative Logits
be
0.55
of
0.52
et
0.50
in
0.50
was
0.48
ad
0.48
odore
0.46
of
0.46
Client
0.46
mselves
0.45
POSITIVE LOGITS
носит
0.66
д
0.66
ставляет
0.64
edilir
0.62
habituellement
0.60
merav
0.60
становится
0.59
modifies
0.59
থাকে
0.58
relieves
0.57
Activations Density 0.343%