INDEX
    Explanations

    phrases indicating movement or action

    past tense verbs and certain action-related phrases

    New Auto-Interp
    Negative Logits
    avery
    -0.70
    mong
    -0.67
    hov
    -0.66
    eway
    -0.65
    vier
    -0.65
    WER
    -0.64
    icist
    -0.64
    bold
    -0.64
    gart
    -0.63
    ateg
    -0.62
    POSITIVE LOGITS
    join
    0.66
     Chimera
    0.61
     THEM
    0.59
    strap
    0.58
     Leopard
    0.58
    ipeg
    0.57
     them
    0.56
     Scorp
    0.56
    æĸ¹
    0.56
     VIDEOS
    0.55
    Act Density 0.273%

    No Known Activations