INDEX
    Explanations

    references to previously mentioned entities or concepts

    New Auto-Interp
    Negative Logits
       
    -0.19
     ï
    -0.16
    /she
    -0.16
    ...↵
    -0.15
    /or
    -0.15
    â̦
    -0.15
    DOT
    -0.15
     recently
    -0.15
    —the
    -0.14
    -0.14
    POSITIVE LOGITS
    /current
    0.22
    /original
    0.22
    most
    0.19
    mente
    0.19
    aneously
    0.18
    lest
    0.18
    -issue
    0.18
    liest
    0.17
    ständ
    0.17
    -lived
    0.16
    Act Density 1.320%

    No Known Activations