INDEX
    Explanations

    political or conflict discussion

    New Auto-Interp
    Negative Logits
     his
    0.97
    his
    0.88
    hounds
    0.81
    ij
    0.81
     Belg
    0.81
    legs
    0.80
    creat
    0.79
    enschappen
    0.79
    instell
    0.79
    son
    0.79
    POSITIVE LOGITS
     ModInt
    1.45
     매우
    1.44
     وتق
    1.42
     개선
    1.41
     경영
    1.41
     스마트
    1.37
    1.37
     എം
    1.37
    1.36
     TestAvg
    1.36
    Act Density 0.000%

    No Known Activations