INDEX
Explanations
verbs and phrases associated with actions and their effectiveness
"do" followed by a negative word
do not / do nothing
New Auto-Interp
Negative Logits
متعلقه
-0.70
<bos>
-0.67
Portail
-0.66
HasFactory
-0.63
TagHelpers
-0.63
AutoScale
-0.62
useAppContext
-0.62
awtextra
-0.62
Cyfeiriadau
-0.60
الحره
-0.60
POSITIVE LOGITS
justice
0.68
little
0.62
away
0.61
violence
0.61
credit
0.60
nothing
0.58
wonders
0.54
violence
0.53
oming
0.53
harm
0.51
Activations Density 0.130%