INDEX
    Explanations

    references to facial expressions and emotional states

    New Auto-Interp
    Negative Logits
    egin
    -0.20
    rollo
    -0.17
     dizzy
    -0.15
    urette
    -0.14
    ovsky
    -0.14
    mind
    -0.14
    buzz
    -0.14
    braco
    -0.13
    Sharper
    -0.13
    phet
    -0.13
    POSITIVE LOGITS
     expression
    0.43
     expressions
    0.38
     Expression
    0.37
    expression
    0.35
    Expression
    0.35
    -expression
    0.34
     facial
    0.33
     expres
    0.32
    表æĥħ
    0.30
    Expressions
    0.28
    Act Density 0.211%

    No Known Activations