INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    алося
    -0.07
    dera
    -0.07
    weise
    -0.07
     třetí
    -0.07
     Sadd
    -0.07
    214
    -0.06
    اها
    -0.06
     vztah
    -0.06
     кораб
    -0.06
     Jahre
    -0.06
    POSITIVE LOGITS
     Clean
    0.11
     cleaned
    0.11
    Clean
    0.11
     cleaning
    0.11
     clean
    0.10
     Cleaning
    0.10
     CLEAN
    0.09
     cleansing
    0.09
     cleaner
    0.08
     cleaners
    0.08
    Act Density 0.018%

    No Known Activations