INDEX
    Explanations

    phrases indicating speculation, understanding, or assurance

    expressions of prediction or assumption

    New Auto-Interp
    Negative Logits
     hypers
    -0.69
     bats
    -0.67
    raviolet
    -0.67
     mutants
    -0.64
     Kik
    -0.63
     showc
    -0.63
     Ambro
    -0.63
    Baltimore
    -0.63
    ilty
    -0.62
     decom
    -0.61
    POSITIVE LOGITS
    ħĭ
    0.84
    idate
    0.78
    ĵĺ
    0.77
    uate
    0.75
     firsthand
    0.72
    rue
    0.68
    alogy
    0.66
    atos
    0.65
     myself
    0.65
     confidently
    0.64
    Act Density 0.122%

    No Known Activations