INDEX
    Explanations

    references to physical appearances, especially facial expressions and features

    New Auto-Interp
    Negative Logits
    udes
    -0.19
    ieder
    -0.17
    овÑĸд
    -0.17
    cedure
    -0.17
    esus
    -0.16
    AYOUT
    -0.15
    141
    -0.15
    ency
    -0.15
    AndView
    -0.15
    aptor
    -0.14
    POSITIVE LOGITS
    /head
    0.18
     Rooney
    0.17
    (face
    0.15
    ushima
    0.14
    -face
    0.14
    ually
    0.14
     candy
    0.13
     faces
    0.13
    /body
    0.13
    -quote
    0.13
    Act Density 0.036%

    No Known Activations