INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Kos
    0.77
    appendText
    0.76
     ой
    0.75
     другое
    0.73
     পারছি
    0.72
    וי
    0.71
    다른
    0.70
    আয়
    0.70
     آ
    0.69
    وین
    0.69
    POSITIVE LOGITS
    chent
    0.84
     courage
    0.81
     diligence
    0.80
     deceive
    0.80
     enticing
    0.79
     assured
    0.79
     awareness
    0.79
     dubbed
    0.78
     irresistible
    0.77
     deceptive
    0.77
    Act Density 0.001%

    No Known Activations