INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     provision
    -0.08
    Chan
    -0.08
     Chan
    -0.08
     landmark
    -0.07
     influenza
    -0.07
     جس
    -0.07
    जिस
    -0.07
    vole
    -0.07
    OLUTION
    -0.07
    ='+
    -0.07
    POSITIVE LOGITS
     Sym
    0.08
    posição
    0.08
    0.07
     зерк
    0.07
    Mirror
    0.07
    mir
    0.07
     negatively
    0.07
    reti
    0.07
    ignon
    0.07
    ówki
    0.07
    Act Density 0.001%

    No Known Activations