INDEX
    Explanations

    sentences related to political or controversial statements

    New Auto-Interp
    Negative Logits
     embra
    -1.14
     immen
    -1.09
     oner
    -1.03
     inder
    -1.02
     incess
    -1.01
     effe
    -0.98
     dises
    -0.98
     „,
    -0.98
     interse
    -0.98
     abnorm
    -0.97
    POSITIVE LOGITS
     no
    0.82
     NO
    0.76
     No
    0.71
    no
    0.69
    NO
    0.67
    Nein
    0.67
     nor
    0.67
     not
    0.66
    No
    0.65
     neither
    0.63
    Act Density 0.067%

    No Known Activations