INDEX
    Explanations

    phrases indicating addition or inclusion

    New Auto-Interp
    Negative Logits
    yah
    -0.65
    NING
    -0.64
     Zel
    -0.60
    ARE
    -0.60
    zing
    -0.59
    cdn
    -0.58
     THR
    -0.58
    grim
    -0.57
    mare
    -0.57
    rior
    -0.56
    POSITIVE LOGITS
    endum
    1.32
    itional
    1.18
    ictions
    1.14
    ition
    1.10
    itions
    1.09
    ressing
    1.09
    itionally
    1.07
    resses
    1.03
    ictive
    1.03
     insult
    1.02
    Act Density 0.543%

    No Known Activations