INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Kenney
    0.45
     griseo
    0.40
     Robbins
    0.36
    rotto
    0.36
    0.36
    ከል
    0.36
     Knoll
    0.36
    ʖ
    0.36
    0.36
     Kassel
    0.35
    POSITIVE LOGITS
     CONT
    0.41
    Presented
    0.39
     PLA
    0.39
    ला
    0.38
     QA
    0.38
     добро
    0.38
    isPresent
    0.38
    Veteran
    0.37
    Interactor
    0.36
     presenter
    0.36
    Act Density 0.001%

    No Known Activations