INDEX
    Explanations

    dates or time related expressions

    references to a specific character or figure in a narrative, particularly with names or titles

    New Auto-Interp
    Negative Logits
     Beir
    -0.77
     hypers
    -0.70
    itionally
    -0.66
     Claus
    -0.65
    arella
    -0.64
    ivid
    -0.64
     killer
    -0.64
     Cind
    -0.63
    agne
    -0.63
     LAPD
    -0.63
    POSITIVE LOGITS
    Åį
    1.11
    nen
    1.04
    ··
    0.98
    ¬
    0.97
    Å«
    0.93
    su
    0.91
    nin
    0.90
    shi
    0.90
    rates
    0.85
    jin
    0.84
    Act Density 0.007%

    No Known Activations