INDEX
    Explanations

    phrases related to advocating for particular concepts or beliefs

    New Auto-Interp
    Negative Logits
    craft
    -0.81
    cies
    -0.73
    mares
    -0.71
     besides
    -0.69
     writes
    -0.68
    ells
    -0.67
    thood
    -0.67
    fn
    -0.67
    ersen
    -0.67
    each
    -0.67
    POSITIVE LOGITS
     easiest
    1.26
     strongest
    1.25
     same
    1.22
     hardest
    1.17
     simplest
    1.17
     largest
    1.15
     heaviest
    1.13
     biggest
    1.13
     smallest
    1.13
     greatest
    1.13
    Act Density 0.251%

    No Known Activations