INDEX
    Explanations

    determiners and words expressing certainty or confirmation

    New Auto-Interp
    Negative Logits
    zan
    -0.67
    anches
    -0.65
    ãĤī
    -0.65
    tsy
    -0.64
    azar
    -0.59
    onder
    -0.57
    irm
    -0.57
     Guam
    -0.56
    uclear
    -0.56
    andy
    -0.56
    POSITIVE LOGITS
     supposed
    0.99
     meant
    0.89
     gonna
    0.89
     going
    0.85
    nt
    0.84
     doing
    0.82
     worth
    0.79
     happening
    0.77
     referring
    0.77
     anyways
    0.77
    Act Density 0.084%

    No Known Activations