INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     review
    -1.73
    review
    -1.59
    Review
    -1.52
     Review
    -1.50
     reviewing
    -1.47
     REVIEW
    -1.42
     reviewed
    -1.41
     Reviewing
    -1.33
     reviewer
    -1.27
    REVIEW
    -1.25
    POSITIVE LOGITS
     of
    0.88
    ist
    0.55
     to
    0.54
     by
    0.54
    sigs
    0.50
     and
    0.47
     against
    0.46
    .
    0.46
     with
    0.44
     for
    0.44
    Act Density 0.379%

    No Known Activations