INDEX
    Explanations

    phrases that indicate additional information or emphasize various points in a discussion

    Text following transition words

    introducing further information

    New Auto-Interp
    Negative Logits
     først
    -0.60
    まずは
    -0.60
     nonetheless
    -0.56
     primarily
    -0.56
     nevertheless
    -0.56
    primarily
    -0.55
     eerst
    -0.55
     ابتدا
    -0.54
    icitis
    -0.54
    首先
    -0.53
    POSITIVE LOGITS
    Personendaten
    0.38
     added
    0.38
    postsleuth
    0.36
     ditambah
    0.35
     informée
    0.35
     engraçadas
    0.34
    Legături
    0.33
     fordi
    0.33
    arXiv
    0.33
    ężczy
    0.33
    Act Density 0.475%

    No Known Activations