INDEX
    Explanations

    references to actions and emotional responses in conversational contexts

    New Auto-Interp
    Negative Logits
     CreateTagHelper
    -0.86
     AssemblyCulture
    -0.68
    Jereo
    -0.62
    tawesome
    -0.61
    hdashline
    -0.59
    uvwxyz
    -0.59
    kloped
    -0.59
    riwal
    -0.58
     Paglinawan
    -0.58
    GEBURTSDATUM
    -0.57
    POSITIVE LOGITS
     للمعارف
    0.66
    Inter
    0.54
     surla
    0.52
    selt
    0.46
     chì
    0.46
    uden
    0.46
    uteen
    0.46
    apal
    0.46
    inter
    0.45
     Inter
    0.45
    Act Density 0.035%

    No Known Activations