INDEX
    Explanations

    information revealing surprising or unexpected facts

    phrases that emphasize revelations or surprising conclusions

    New Auto-Interp
    Negative Logits
    icipated
    -0.72
    cious
    -0.70
    oided
    -0.67
    ilater
    -0.67
    uli
    -0.66
    ombs
    -0.66
    comm
    -0.66
    shaw
    -0.65
    notations
    -0.65
    resents
    -0.64
    POSITIVE LOGITS
     there
    0.84
     nobody
    0.71
    âĶĢ
    0.71
    ymes
    0.65
     they
    0.65
     quite
    0.63
     Professor
    0.62
     none
    0.62
    din
    0.62
     THERE
    0.62
    Act Density 0.036%

    No Known Activations