INDEX
Explanations
phrases describing challenges or obstacles
New Auto-Interp
Negative Logits
%)$
-0.84
simple
-0.80
ساده
-0.77
MessageTagHelper
-0.77
easily
-0.74
easy
-0.74
łat
-0.73
einfachen
-0.73
Simple
-0.73
SIMPLE
-0.73
POSITIVE LOGITS
Schwier
0.79
Difficulty
0.73
difficulty
0.71
hardness
0.69
Difficult
0.69
Difficult
0.68
Difficulties
0.68
impossible
0.67
difficult
0.63
Harder
0.63
Activations Density 0.053%