INDEX
    Explanations

    instances of the word "that."

    New Auto-Interp
    Negative Logits
    šen
    -0.17
    inux
    -0.16
    ikut
    -0.16
    å§ĵ
    -0.16
    252
    -0.15
    ingu
    -0.15
    rompt
    -0.15
    glomer
    -0.15
     Gia
    -0.15
    ihn
    -0.14
    POSITIVE LOGITS
    ai
    0.15
     हल
    0.14
     prive
    0.14
    afi
    0.14
    OTH
    0.14
    ekler
    0.14
    667
    0.14
    ileÅŁ
    0.13
     LSM
    0.13
    ves
    0.13
    Act Density 0.023%

    No Known Activations