INDEX
    Explanations

    words related to actions and roles in various contexts

    New Auto-Interp
    Negative Logits
    -toggler
    -0.17
    agher
    -0.15
    umber
    -0.15
    onis
    -0.15
    ughter
    -0.14
    witter
    -0.14
    bject
    -0.14
    kyt
    -0.14
    ayo
    -0.14
    elow
    -0.14
    POSITIVE LOGITS
    lava
    0.17
    OKIE
    0.14
     Dud
    0.14
    flare
    0.14
     vil
    0.13
    robe
    0.13
     Alic
    0.13
    473
    0.13
    quil
    0.13
    llib
    0.13
    Act Density 0.038%

    No Known Activations