INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <0x0D>
    0.70
    0.70
    Я
    0.69
    0.69
    Т
    0.67
    Да
    0.66
    Не
    0.64
    Ин
    0.64
    נ
    0.63
    С
    0.62
    POSITIVE LOGITS
     နှစ်
    0.52
    मुक्त
    0.52
     aik
    0.52
     refusal
    0.50
    wq
    0.50
     junkie
    0.50
     त्यांची
    0.49
     तलैया
    0.49
     Jungkook
    0.49
     complet
    0.48
    Act Density 0.002%

    No Known Activations