INDEX
Explanations
the word "stand" or its variations in different contexts
New Auto-Interp
Negative Logits
atern
-0.19
urator
-0.16
ing
-0.16
нам
-0.14
nod
-0.14
LOC
-0.14
Aeros
-0.14
zym
-0.14
icans
-0.14
odst
-0.14
POSITIVE LOGITS
-alone
0.40
alone
0.38
Alone
0.27
arde
0.26
still
0.26
alone
0.25
ards
0.22
mixer
0.21
lone
0.21
ings
0.21
Activations Density 0.009%