INDEX
    Explanations

    references to emotional states or psychological conditions

    New Auto-Interp
    Negative Logits
    .
    -0.32
    -0.27
    ,
    -0.26
     
    -0.26
    .↵
    -0.24
     ,
    -0.22
    p
    -0.22
    a
    -0.22
     (
    -0.22
    :
    -0.22
    POSITIVE LOGITS
    галÑĸ
    0.41
    лÑĸ
    0.36
    вÑĸ
    0.35
    ÑĢÑĸ
    0.35
    нÑĸ
    0.33
    енÑĸ
    0.33
    ÑĤÑĸ
    0.33
    елÑĸ
    0.33
    Òij
    0.33
    ÑĶ
    0.32
    Act Density 0.034%

    No Known Activations