INDEX
    Explanations

    instances of governmental or political power dynamics related to entities and actions

    New Auto-Interp
    Negative Logits
     &=&
    -0.74
     fallu
    -0.65
    &=&\
    -0.62
    &=&
    -0.60
    她們
    -0.57
     करती
    -0.54
     которое
    -0.53
     Оно
    -0.52
    它們
    -0.52
    它们
    -0.51
    POSITIVE LOGITS
     his
    3.78
     he
    3.11
     him
    3.05
    his
    2.88
     himself
    2.81
    彼は
    2.77
    彼の
    2.72
    彼が
    2.60
    himself
    2.30
    他的
    2.30
    Act Density 5.268%

    No Known Activations