INDEX
    Explanations

    phrases related to comparisons or contrasts

    repeated instances of the word "the."

    New Auto-Interp
    Negative Logits
    arate
    -0.79
    antes
    -0.79
    Ò
    -0.77
    imi
    -0.76
    thood
    -0.75
    ceive
    -0.73
    icia
    -0.72
    bg
    -0.71
    arettes
    -0.71
    ania
    -0.71
    POSITIVE LOGITS
     latter
    1.30
     biggest
    1.21
     vast
    1.19
     majority
    1.16
     sheer
    1.14
     simplest
    1.12
     absence
    1.10
     slightest
    1.10
     latest
    1.09
    oret
    1.08
    Act Density 0.357%

    No Known Activations