INDEX
Explanations
phrases indicating inability to perform a specific action
phrases indicating negation or inability
New Auto-Interp
Negative Logits
senal
-0.68
Actions
-0.66
artifacts
-0.65
ILA
-0.65
Appears
-0.62
amy
-0.60
esting
-0.59
mite
-0.59
marks
-0.58
Transparency
-0.58
POSITIVE LOGITS
afford
1.17
cope
0.97
overcome
0.92
conceive
0.91
muster
0.90
convince
0.90
foresee
0.89
comprehend
0.86
persuade
0.85
feas
0.83
Activations Density 0.113%