INDEX
    Explanations

    instances of the word "earlier."

    New Auto-Interp
    Negative Logits
    yla
    -0.18
     little
    -0.16
    sWith
    -0.15
    omet
    -0.15
    ritch
    -0.14
    boxes
    -0.14
    ses
    -0.14
    yl
    -0.14
    SES
    -0.14
     older
    -0.14
    POSITIVE LOGITS
    -than
    0.37
    _than
    0.33
     than
    0.32
     THAN
    0.27
    than
    0.26
    Than
    0.23
     Than
    0.22
    _THAN
    0.20
     než
    0.20
     нÑĸж
    0.19
    Act Density 0.014%

    No Known Activations