INDEX
Explanations
negations or terms indicating a lack of applicability or suitability
not followed by positive quality
New Auto-Interp
Negative Logits
unofficial
-0.42
gotta
-0.39
prohibido
-0.39
butuh
-0.37
official
-0.36
fallu
-0.36
Incomplete
-0.35
MOST
-0.35
ESPEC
-0.35
ESPECIAL
-0.34
POSITIVE LOGITS
satisfactory
0.71
satisfactorily
0.65
simple
0.62
straightforward
0.62
TagMode
0.59
adequately
0.58
trivial
0.57
simply
0.57
solely
0.54
zufried
0.53
Activations Density 0.304%