INDEX
Explanations
phrases indicating consistency and reliability in behavior or actions
New Auto-Interp
Negative Logits
often
-0.20
ikke
-0.18
tidak
-0.18
Often
-0.18
altogether
-0.17
không
-0.17
не
-0.17
oder
-0.17
souvent
-0.17
artık
-0.17
POSITIVE LOGITS
been
0.25
cky
0.20
greens
0.20
seemed
0.19
seems
0.19
seem
0.19
ready
0.18
green
0.18
gonna
0.18
-on
0.17
Activations Density 0.067%