INDEX
Explanations
phrases that indicate various types of actions or responses related to lists or categorization
New Auto-Interp
Negative Logits
mostly
-0.19
вÑģеÑħ
-0.19
generally
-0.18
éĢļ常
-0.18
ä½ķãģĭ
-0.18
mostly
-0.18
always
-0.17
vÄĽtÅ¡inou
-0.17
largely
-0.17
Mostly
-0.17
POSITIVE LOGITS
even
0.40
simply
0.35
even
0.34
sogar
0.34
outright
0.32
çĶļèĩ³
0.32
downright
0.31
dokonce
0.29
Even
0.27
Even
0.27
Activations Density 0.490%