INDEX
    Explanations

    expressions related to personal reflection, questioning, and opinions

    New Auto-Interp
    Negative Logits
    oway
    -0.74
    ilings
    -0.71
    iling
    -0.70
    represented
    -0.66
    avid
    -0.65
    senal
    -0.64
    cers
    -0.64
     arrivals
    -0.63
    fter
    -0.63
    lication
    -0.61
    POSITIVE LOGITS
     raining
    1.34
     happen
    0.99
     hurts
    0.92
     happened
    0.90
     happens
    0.81
     happening
    0.80
     easier
    0.79
     kinda
    0.77
     depends
    0.74
     beh
    0.74
    Act Density 1.693%

    No Known Activations