INDEX
    Explanations

    phrases that summarize or evaluate the subjects positively

    New Auto-Interp
    Negative Logits
    -0.76
    LLocation
    -0.67
    хьтан
    -0.67
     يتيمه
    -0.66
     ***!
    -0.63
    الإنجليزية
    -0.62
     Italijani
    -0.62
     queſta
    -0.62
     wireType
    -0.62
    rungsseite
    -0.62
    POSITIVE LOGITS
    Overall
    0.90
     overall
    0.90
     Overall
    0.87
    overall
    0.78
    总体
    0.63
    OVERALL
    0.61
    整體
    0.60
    整体
    0.54
     Insgesamt
    0.52
     keseluruhan
    0.50
    Act Density 0.006%

    No Known Activations