INDEX
    Explanations

    verbs and phrases associated with actions and their effectiveness

    "do" followed by a negative word

    New Auto-Interp
    Negative Logits
     متعلقه
    -0.70
    <bos>
    -0.67
    Portail
    -0.66
     HasFactory
    -0.63
    TagHelpers
    -0.63
    AutoScale
    -0.62
     useAppContext
    -0.62
    awtextra
    -0.62
    Cyfeiriadau
    -0.60
     الحره
    -0.60
    POSITIVE LOGITS
     justice
    0.68
     little
    0.62
     away
    0.61
     violence
    0.61
     credit
    0.60
     nothing
    0.58
     wonders
    0.54
    violence
    0.53
    oming
    0.53
     harm
    0.51
    Act Density 0.130%

    No Known Activations