INDEX
    Explanations

    phrases indicating self-awareness and reflection

    Preceding end-of-turn tokens

    New Auto-Interp
    Negative Logits
    WriteLiteral
    -0.45
    featureID
    -0.45
    stdc
    -0.43
     newOwner
    -0.43
    WriteAttribute
    -0.40
    numerusform
    -0.40
     CreateTagHelper
    -0.40
     distanciation
    -0.39
     dist
    -0.39
    Diwedd
    -0.39
    POSITIVE LOGITS
    Diweddarwch
    0.54
    قایناق‌لار
    0.45
    Glej
    0.45
     käyttö
    0.42
    CppCodeGen
    0.41
     désolés
    0.41
    ˾
    0.40
     tantum
    0.39
    Vezi
    0.39
    úgó
    0.39
    Act Density 0.417%

    No Known Activations