INDEX
    Explanations

    phrases related to direction or manner of action

    New Auto-Interp
    Negative Logits
    ipay
    -0.17
    üss
    -0.16
    .pixel
    -0.15
    assic
    -0.15
    itters
    -0.15
    chy
    -0.14
    agram
    -0.14
    ká
    -0.14
    adders
    -0.14
     Bowen
    -0.14
    POSITIVE LOGITS
    finding
    0.19
    ajar
    0.18
    ÑĤин
    0.16
    ward
    0.15
    lon
    0.15
    ana
    0.15
    691
    0.14
    UDA
    0.14
    ne
    0.14
    вÑĸд
    0.14
    Act Density 0.026%

    No Known Activations