INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     manners
    0.69
     negligently
    0.68
     miraculous
    0.67
     любые
    0.64
     negligent
    0.63
     any
    0.63
     negligence
    0.63
     falle
    0.62
     любы
    0.62
    0.62
    POSITIVE LOGITS
    vdash
    0.61
    あなたは
    0.58
    libc
    0.57
    ":"
    0.56
     perché
    0.56
     #:
    0.56
     Steep
    0.55
    あなた
    0.55
    资产
    0.55
    gerät
    0.54
    Act Density 0.066%

    No Known Activations