INDEX
    Explanations

    phrases indicating outcomes or conclusions

    New Auto-Interp
    Negative Logits
    oria
    -0.16
    onis
    -0.16
    tery
    -0.16
    /english
    -0.15
    vertiser
    -0.15
    etten
    -0.15
    duk
    -0.15
    .ejb
    -0.14
    orex
    -0.14
    екÑĥ
    -0.14
    POSITIVE LOGITS
     boil
    0.30
     boils
    0.28
     boiled
    0.25
     Bo
    0.23
     boiling
    0.22
     down
    0.21
    _bo
    0.19
     boz
    0.19
    bo
    0.19
    Bo
    0.18
    Act Density 0.099%

    No Known Activations