INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fortune
    -0.07
     entrar
    -0.07
    &quot
    -0.07
    fuck
    -0.07
     donation
    -0.06
    -0.06
     room
    -0.06
    	dto
    -0.06
    telegram
    -0.06
     claims
    -0.06
    POSITIVE LOGITS
    (_,
    0.07
    лами
    0.07
    etak
    0.06
    0.06
    .SC
    0.06
    INTER
    0.06
    ensi
    0.06
    irmed
    0.06
    trait
    0.06
    acak
    0.06
    Act Density 0.028%

    No Known Activations