INDEX
    Explanations

    words related to emphasizing importance or significance

    the word "the" in various contexts throughout the text

    New Auto-Interp
    Negative Logits
    craft
    -0.82
    claw
    -0.74
    each
    -0.68
    fw
    -0.68
     besides
    -0.67
    leeve
    -0.67
    abuse
    -0.67
    adoes
    -0.67
    ago
    -0.65
    rade
    -0.64
    POSITIVE LOGITS
     easiest
    1.27
     simplest
    1.22
     same
    1.18
     strongest
    1.17
     greatest
    1.15
     biggest
    1.13
     heaviest
    1.12
     largest
    1.11
     smallest
    1.10
     hardest
    1.09
    Act Density 0.303%

    No Known Activations