INDEX
    Explanations

    words related to emotional states and behaviors

    New Auto-Interp
    Negative Logits
    y
    -0.43
    yat
    -0.17
    olest
    -0.15
    odal
    -0.15
     Hlav
    -0.14
    yah
    -0.14
     Gros
    -0.14
     Kab
    -0.14
    PointSize
    -0.14
    affer
    -0.13
    POSITIVE LOGITS
    ym
    0.39
    ypass
    0.39
    ystate
    0.39
    ystack
    0.38
    ies
    0.37
    ysize
    0.37
    ys
    0.37
    yst
    0.36
    ypress
    0.36
    yp
    0.35
    Act Density 0.053%

    No Known Activations