INDEX
    Explanations

    the word "while" or its variations, indicating temporal transitions or contrasts in text

    New Auto-Interp
    Negative Logits
    erap
    -0.17
     but
    -0.17
     też
    -0.16
     another
    -0.15
     whereas
    -0.15
    orsi
    -0.15
    oise
    -0.15
     ancak
    -0.14
     hatta
    -0.14
    åı¦å¤ĸ
    -0.14
    POSITIVE LOGITS
     initially
    0.22
    yes
    0.20
    s
    0.20
     yes
    0.19
     Initially
    0.17
     certainly
    0.17
     most
    0.17
     not
    0.17
    Initially
    0.17
     there
    0.16
    Act Density 0.023%

    No Known Activations