INDEX
    Explanations

    phrases related to additions or new elements being introduced

    instances of the word "addition" in various contexts

    New Auto-Interp
    Negative Logits
    zh
    -0.70
    zees
    -0.67
    rior
    -0.65
    raz
    -0.65
    yah
    -0.63
    bis
    -0.62
    zi
    -0.62
    zee
    -0.61
    walking
    -0.61
    mos
    -0.60
    POSITIVE LOGITS
    endum
    0.99
    xual
    0.86
    ition
    0.84
    itious
    0.80
    verted
    0.78
     Flavoring
    0.76
     thereto
    0.73
    xon
    0.73
     insult
    0.71
     bonus
    0.69
    Act Density 0.020%

    No Known Activations