INDEX
    Explanations

    instances of significant actions or events involving strong physical interactions

    New Auto-Interp
    Negative Logits
    askell
    -0.15
    oust
    -0.15
    chant
    -0.14
    agli
    -0.14
    opia
    -0.14
    lem
    -0.14
    xCD
    -0.14
    lius
    -0.14
    stitute
    -0.14
    átor
    -0.13
    POSITIVE LOGITS
    ething
    0.16
     Directions
    0.15
    pra
    0.15
    notated
    0.14
     angel
    0.14
    raž
    0.14
    660
    0.14
    Directions
    0.13
    ved
    0.13
    ahlen
    0.13
    Act Density 0.551%

    No Known Activations