INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Va
    -0.08
     velk
    -0.06
     frau
    -0.06
    „P
    -0.06
     provoc
    -0.06
     Michel
    -0.06
    -cli
    -0.06
     Однако
    -0.06
    。(
    -0.06
     polarity
    -0.06
    POSITIVE LOGITS
     conjunto
    0.07
    owler
    0.07
     Sonata
    0.07
     elemento
    0.07
     oppose
    0.06
    мя
    0.06
     звер
    0.06
    !",↵
    0.06
     зв
    0.06
     pron
    0.06
    Act Density 0.010%

    No Known Activations