INDEX
    Explanations

    expressions of strong opinions or emotional reactions

    Expressing opinions or reactions

    exclamations and sentiments

    New Auto-Interp
    Negative Logits
     laſſen
    -0.58
     queſta
    -0.57
    addPreferredGap
    -0.56
    centralwidget
    -0.56
     iNdEx
    -0.55
    LabelTagHelper
    -0.54
     ujednoznacz
    -0.54
    -0.52
     arşivlendi
    -0.52
    ロウィン
    -0.52
    POSITIVE LOGITS
    !
    0.44
     Signalez
    0.42
    ?
    0.40
    !!!
    0.38
     indeed
    0.35
    ?!
    0.34
     alın
    0.33
    Cyfeiriadau
    0.32
     idea
    0.32
     relâche
    0.32
    Act Density 0.219%

    No Known Activations