INDEX
    Explanations

    pronouns and determiners that often signal references to people, events, or specifics in a narrative context

    New Auto-Interp
    Negative Logits
    _mC
    -0.20
    _mB
    -0.18
    _mD
    -0.17
    _mE
    -0.16
     sovereign
    -0.15
     iter
    -0.15
    otts
    -0.15
    _tD
    -0.15
    .uni
    -0.14
    itter
    -0.14
    POSITIVE LOGITS
    isha
    0.18
     fir
    0.15
     Fir
    0.14
    umerator
    0.14
    imes
    0.14
    ninger
    0.14
    oka
    0.14
    oken
    0.14
    compan
    0.14
    inda
    0.13
    Act Density 0.270%

    No Known Activations