INDEX
    Explanations

    personal reflection or commentary within sentences

    phrases expressing concern or caution

    New Auto-Interp
    Negative Logits
     targ
    -0.80
    è¦ļéĨĴ
    -0.79
    ipers
    -0.78
    area
    -0.71
    rir
    -0.68
    artney
    -0.66
    raq
    -0.66
     explan
    -0.64
     faintly
    -0.63
    battle
    -0.63
    POSITIVE LOGITS
     somew
    0.69
     Penguin
    0.69
     none
    0.66
     neither
    0.65
    chery
    0.65
     ignorance
    0.62
    nob
    0.61
    imaru
    0.61
     Meow
    0.61
     injuries
    0.59
    Act Density 0.132%

    No Known Activations