INDEX
    Explanations

    phrases that indicate success or notable events

    New Auto-Interp
    Negative Logits
     Monfieur
    -0.84
     Theſe
    -0.84
     myſelf
    -0.74
    GrantedAuthority
    -0.73
     Paglinawan
    -0.72
     Jefus
    -0.72
    клопе
    -0.72
     Shakspeare
    -0.71
     Diſ
    -0.70
     iſt
    -0.69
    POSITIVE LOGITS
     now
    0.61
    now
    0.53
    AndEndTag
    0.46
     Now
    0.44
    .
    0.43
    <strong>
    0.43
    Now
    0.42
     up
    0.42
    p
    0.41
    n
    0.41
    Act Density 0.579%

    No Known Activations