INDEX
    Explanations

    phrases indicating rationale or justification

    phrases emphasizing the justification or rationale behind statements

    New Auto-Interp
    Negative Logits
    chron
    -0.75
    chin
    -0.68
    tein
    -0.68
     Carbuncle
    -0.65
    inav
    -0.64
    semble
    -0.64
    eg
    -0.62
    ega
    -0.61
     Warcraft
    -0.60
     ages
    -0.60
    POSITIVE LOGITS
     why
    1.41
     WHY
    1.19
    why
    1.17
    abl
    1.12
    Why
    0.94
     Why
    0.88
     justifying
    0.84
    pointers
    0.81
    Reviewer
    0.80
     rationale
    0.78
    Act Density 0.037%

    No Known Activations