INDEX
    Explanations

    phrases with the word "of" followed by highly activating words

    phrases indicating possession or inclusion

    New Auto-Interp
    Negative Logits
    dayName
    -0.80
     condem
    -0.68
    eele
    -0.65
     disposed
    -0.63
    uterte
    -0.62
    illac
    -0.61
    ettel
    -0.61
     subst
    -0.60
    gee
    -0.60
    uca
    -0.60
    POSITIVE LOGITS
    THING
    0.81
     sorts
    0.78
    ahu
    0.74
     sudden
    0.71
     goddamn
    0.71
    together
    0.71
     imaginable
    0.70
     course
    0.68
    ources
    0.67
    important
    0.66
    Act Density 0.080%

    No Known Activations