INDEX
    Explanations

    significant emphasis on the word "this" in various contexts

    New Auto-Interp
    Negative Logits
    ial
    -0.16
    eday
    -0.15
    uts
    -0.15
    tr
    -0.15
     reasons
    -0.14
    that
    -0.14
    bug
    -0.14
    raison
    -0.13
    the
    -0.13
    itoris
    -0.13
    POSITIVE LOGITS
     particular
    0.41
    /th
    0.33
     entire
    0.27
     guy
    0.27
     PARTICULAR
    0.26
    /her
    0.25
     whole
    0.23
    zelf
    0.22
     exact
    0.22
     latest
    0.21
    Act Density 0.387%

    No Known Activations