INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Descripción
    -0.08
     descrição
    -0.08
    Descrição
    -0.07
    Codes
    -0.07
    颜色
    -0.07
     procur
    -0.07
     muv
    -0.07
    方便
    -0.07
    ppt
    -0.07
     steaming
    -0.07
    POSITIVE LOGITS
     rhetoric
    0.10
     convain
    0.10
     provocative
    0.10
     criticizing
    0.10
     skepticism
    0.10
     രാഷ്ട്രീയ
    0.10
     outspoken
    0.09
     accusing
    0.09
    观点
    0.09
     überzeug
    0.09
    Act Density 0.290%

    No Known Activations