INDEX
Explanations
expressions related to feelings of discomfort and the need for guidance in social or personal contexts
New Auto-Interp
Negative Logits
both
-0.71
Various
-0.69
not
-0.68
而不是
-0.66
plutôt
-0.66
various
-0.66
Various
-0.65
autorytatywna
-0.65
而非
-0.64
χι
-0.63
POSITIVE LOGITS
nor
2.58
anymore
2.23
nor
1.85
anything
1.74
anywhere
1.56
nici
1.54
anything
1.54
any
1.52
whatsoever
1.49
Nor
1.48
Activations Density 2.293%