INDEX
    Explanations

    the word "this" and similar pronouns or determiners referring to a specific concept or situation

    New Auto-Interp
    Negative Logits
    »Ĵ
    -0.73
     Antar
    -0.72
    Ö¼
    -0.69
    geons
    -0.66
    Reader
    -0.62
    ename
    -0.61
    atre
    -0.60
    ewitness
    -0.60
    inch
    -0.59
    atively
    -0.58
    POSITIVE LOGITS
     ado
    0.93
     transpired
    0.90
     nonsense
    0.83
     happened
    0.82
     fuss
    0.82
     madness
    0.81
     stuff
    0.81
    stuff
    0.80
     happening
    0.76
     hoop
    0.76
    Act Density 0.064%

    No Known Activations