INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    favorite
    -0.08
     favorit
    -0.08
     individualized
    -0.08
     particuli
    -0.08
     iki
    -0.08
     پسند
    -0.08
    favorites
    -0.08
    ($.
    -0.08
     $↵
    -0.08
     Speed
    -0.07
    POSITIVE LOGITS
     indirectly
    0.23
     indirect
    0.21
    Indirect
    0.21
     indire
    0.16
     INDIRECT
    0.13
    irect
    0.12
     proxy
    0.11
    Proxy
    0.10
     broader
    0.09
    proxy
    0.09
    Act Density 0.098%

    No Known Activations