INDEX
Explanations
actions or activities described in various contexts
New Auto-Interp
Negative Logits
urette
-0.16
Ñģпок
-0.15
áš
-0.15
ÑĮÑı
-0.15
paring
-0.15
UP
-0.15
avra
-0.14
лÑİб
-0.14
rase
-0.14
uropean
-0.14
POSITIVE LOGITS
away
0.29
everything
0.28
wonders
0.27
violence
0.24
unto
0.24
le
0.24
whatever
0.23
battle
0.23
justice
0.23
right
0.22
Activations Density 0.067%