INDEX
    Explanations

    terms related to political parties and their actions

    New Auto-Interp
    Negative Logits
    ]='\
    -0.64
    ]+=
    -0.59
     אשר
    -0.58
    mıştır
    -0.57
     darstellt
    -0.56
    maktadır
    -0.55
     observable
    -0.52
    "]="
    -0.51
    おり
    -0.51
    日至
    -0.50
    POSITIVE LOGITS
     isn
    1.39
     didn
    1.37
     aren
    1.36
     wasn
    1.35
     wouldn
    1.29
     shouldn
    1.29
     ain
    1.27
     weren
    1.22
     really
    1.21
     doesn
    1.20
    Act Density 0.514%

    No Known Activations