INDEX
Explanations
leading to negative consequences
New Auto-Interp
Negative Logits
slightly
0.61
somewhat
0.59
Slightly
0.59
légèrement
0.57
slight
0.56
agak
0.56
Certain
0.56
particular
0.55
particolari
0.55
Slight
0.55
POSITIVE LOGITS
unpredictable
0.78
worse
0.68
disaster
0.68
unpredict
0.68
seriously
0.66
disastrous
0.66
meaningless
0.65
useless
0.64
unreliable
0.64
Worse
0.64
Activations Density 0.038%