INDEX
    Explanations

    instances of the word "while"

    New Auto-Interp
    Negative Logits
    erap
    -0.16
    ught
    -0.15
    .twitter
    -0.15
    WARDED
    -0.14
    ãģĨãģ¡
    -0.14
    ieren
    -0.14
    geb
    -0.13
    anine
    -0.13
    ké
    -0.13
    ìĿ¸ëį°
    -0.13
    POSITIVE LOGITS
     there
    0.36
     it
    0.31
     we
    0.28
    there
    0.27
     some
    0.27
     none
    0.25
     nobody
    0.23
     many
    0.23
     nothing
    0.23
     this
    0.23
    Act Density 0.069%

    No Known Activations