INDEX
    Explanations

    elements of natural language that express complex feelings or experiences

    New Auto-Interp
    Negative Logits
     General
    -0.17
    ÐIJÑĢÑħÑĸв
    -0.17
     ab
    -0.17
     in
    -0.16
     a
    -0.16
     sub
    -0.16
     real
    -0.16
     rel
    -0.16
     pro
    -0.16
     and
    -0.15
    POSITIVE LOGITS
     ÑĩемпÑĸон
    0.33
     ÑĦÑĥÑĤ
    0.29
     команди
    0.28
     клÑĥб
    0.28
     гÑĢав
    0.27
     Ñĩем
    0.24
     коман
    0.24
     збÑĸÑĢ
    0.24
     Ñģез
    0.23
     Чем
    0.22
    Act Density 0.023%

    No Known Activations