INDEX
    Explanations

    phrases related to ability or capability

    assertions of capability or potential

    New Auto-Interp
    Negative Logits
     Federation
    -0.73
    furt
    -0.73
    ele
    -0.62
     Mant
    -0.58
    rant
    -0.58
     revision
    -0.58
     Likes
    -0.58
     Yards
    -0.57
     Cheong
    -0.57
     TED
    -0.56
    POSITIVE LOGITS
    't
    1.64
    NOT
    1.16
    berra
    1.14
    adian
    1.06
     afford
    1.03
     easily
    0.89
    tera
    0.88
    ny
    0.88
     safely
    0.88
    isters
    0.87
    Act Density 0.177%

    No Known Activations