INDEX
    Explanations

    words related to physical processes or transformations

    New Auto-Interp
    Negative Logits
    adolu
    -0.17
    /on
    -0.15
    /up
    -0.14
    /from
    -0.14
    ad
    -0.14
    iger
    -0.14
    éĢŁ
    -0.14
     пÑĸÑĪ
    -0.13
    éĢĶ
    -0.13
    оÑĢон
    -0.13
    POSITIVE LOGITS
     out
    0.21
    -out
    0.21
    -up
    0.20
     up
    0.16
    -off
    0.16
    åĩºæĿ¥
    0.16
    LEAN
    0.15
    -down
    0.15
     off
    0.14
    êµ´
    0.14
    Act Density 0.324%

    No Known Activations