INDEX
    Explanations

    phrases indicating logical or justifiable explanations or causes

    phrases indicating justification or rationale

    New Auto-Interp
    Negative Logits
    semble
    -0.83
    chin
    -0.77
     Carbuncle
    -0.70
    oba
    -0.70
     Ping
    -0.64
    ega
    -0.63
     Territories
    -0.62
    rongh
    -0.61
    inav
    -0.61
    eg
    -0.61
    POSITIVE LOGITS
     why
    0.92
     justifying
    0.91
     whatsoever
    0.85
    pointers
    0.84
    Reviewer
    0.81
    abl
    0.76
     justify
    0.74
     justification
    0.73
    forward
    0.73
     WHY
    0.72
    Act Density 0.026%

    No Known Activations