INDEX
    Explanations

    politically charged language and references to authority figures or governmental actions

    New Auto-Interp
    Negative Logits
    Aiheesta
    -0.47
    RegressionTest
    -0.44
     care
    -0.43
     lieb
    -0.43
     caring
    -0.42
    ORE
    -0.41
    ENBERG
    -0.40
    -0.40
     зала
    -0.39
    ÍTULO
    -0.39
    POSITIVE LOGITS
    自分も
    1.17
     myself
    1.02
    僕も
    0.96
     yours
    0.94
     own
    0.93
    myself
    0.83
    Erreferentziak
    0.83
     ourselves
    0.81
     yourself
    0.78
    私も
    0.78
    Act Density 0.335%

    No Known Activations