INDEX
    Explanations

    actions related to change, movement, or transformation

    New Auto-Interp
    Negative Logits
     Heard
    -0.17
    pon
    -0.15
    кÑĤа
    -0.15
    erv
    -0.14
    ÙĬÙĩ
    -0.14
    son
    -0.13
     extreme
    -0.13
     everything
    -0.13
    ect
    -0.13
    ree
    -0.13
    POSITIVE LOGITS
    instead
    0.17
    Instead
    0.17
     Instead
    0.16
    coni
    0.15
    DEM
    0.15
     instead
    0.14
    icone
    0.14
    ä¼ij
    0.14
    omor
    0.14
    ente
    0.14
    Act Density 0.033%

    No Known Activations