INDEX
    Explanations

    phrases related to claims or assertions of identity or status

    phrases where someone is making a claim

    New Auto-Interp
    Negative Logits
    furt
    -0.71
    course
    -0.70
    bats
    -0.65
    specified
    -0.65
    noticed
    -0.65
    river
    -0.63
    apps
    -0.60
     Rapids
    -0.58
    cart
    -0.58
    items
    -0.57
    POSITIVE LOGITS
     specialize
    0.96
     embody
    0.89
     represent
    0.88
     derive
    0.86
     be
    0.85
     recreate
    0.85
     perform
    0.80
     solve
    0.80
     speak
    0.80
     have
    0.80
    Act Density 0.034%

    No Known Activations