INDEX
    Explanations

    statements or phrases highlighting the consequences or ethical implications of actions

    New Auto-Interp
    Negative Logits
     maroc
    -0.69
     thuy
    -0.67
     gmbh
    -0.66
     myn
    -0.66
     ria
    -0.65
     meis
    -0.65
     inder
    -0.65
     wien
    -0.63
     sena
    -0.62
     ambass
    -0.62
    POSITIVE LOGITS
     so
    0.86
    so
    0.67
     So
    0.66
     paž
    0.62
    So
    0.62
    spesies
    0.59
     SO
    0.58
     zodat
    0.57
    0.56
    bzw
    0.53
    Act Density 0.099%

    No Known Activations