INDEX
    Explanations

    instances of the word "this" and phrases expressing emphasis or significance

    New Auto-Interp
    Negative Logits
    wiki
    -0.17
    elson
    -0.15
    odd
    -0.15
    ι
    -0.15
    ons
    -0.14
    nt
    -0.14
    id
    -0.14
     wonders
    -0.14
    orthand
    -0.14
    iler
    -0.13
    POSITIVE LOGITS
     is
    0.27
    æĺ¯æĪij
    0.20
     was
    0.19
     isn
    0.19
     morning
    0.18
     entire
    0.17
     wasn
    0.17
     whole
    0.17
     thing
    0.16
    timeofday
    0.16
    Act Density 0.182%

    No Known Activations