INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ین
    0.54
    الت
    0.52
     irony
    0.51
    ドラ
    0.49
    Т
    0.49
    و
    0.48
    دا
    0.48
    0.48
     fellowship
    0.47
    ТИ
    0.47
    POSITIVE LOGITS
     tjen
    0.54
     Giáo
    0.50
    sema
    0.49
    0.49
     khác
    0.48
     lưu
    0.47
    ofd
    0.47
     然后
    0.47
     Cria
    0.47
     白色
    0.46
    Act Density 0.001%

    No Known Activations