INDEX
    Explanations

    works perfectly, fine, possible

    New Auto-Interp
    Negative Logits
    而非
    0.44
    ים
    0.43
    のではなく
    0.41
    而不是
    0.40
     Lucifer
    0.39
    Examination
    0.38
     Акы
    0.38
     đức
    0.38
    sPath
    0.38
     preventable
    0.38
    POSITIVE LOGITS
     but
    0.55
     👌
    0.55
    👌
    0.54
     pero
    0.54
     aber
    0.52
    ل
    0.52
     પરંતુ
    0.50
     arrêt
    0.49
     aftermarket
    0.49
    0.48
    Act Density 0.010%

    No Known Activations