INDEX
    Explanations

    phrases indicating capability or ability

    New Auto-Interp
    Negative Logits
    <bos>
    -0.60
     the
    -0.54
     your
    -0.48
     Schwartz
    -0.48
     in
    -0.46
     McCulloch
    -0.45
     these
    -0.43
     בח
    -0.43
     this
    -0.43
    The
    -0.42
    POSITIVE LOGITS
    Able
    0.99
     Able
    0.96
     able
    0.93
    unable
    0.80
    IsMutable
    0.76
    Unable
    0.76
     unable
    0.75
    tagHelperRunner
    0.74
     Unable
    0.74
    <unused43>
    0.74
    Act Density 0.011%

    No Known Activations