INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    itaj
    -0.09
     ngg
    -0.08
    uptools
    -0.07
     partic
    -0.07
     Restaurants
    -0.07
    ndry
    -0.07
    rch
    -0.07
     Exxon
    -0.07
    .cons
    -0.07
    entrée
    -0.07
    POSITIVE LOGITS
     glimpse
    0.08
     stat
    0.08
    וקים
    0.07
     sketches
    0.07
    ній
    0.07
    ление
    0.07
    _nums
    0.07
    Nums
    0.07
     adjectives
    0.07
     humano
    0.07
    Act Density 0.003%

    No Known Activations