INDEX
    Explanations

    references to historical and societal flaws or injustices

    New Auto-Interp
    Negative Logits
     obtenu
    -0.51
     mid
    -0.50
    FINALLY
    -0.48
    󠁿
    -0.47
    Życiorys
    -0.47
    GraphicsUnit
    -0.46
     reçu
    -0.45
    vist
    -0.45
     latine
    -0.44
    andet
    -0.44
    POSITIVE LOGITS
     similar
    0.78
    今回も
    0.77
    similar
    0.75
    Similar
    0.71
     similarly
    0.70
     ähnliche
    0.70
     Similar
    0.69
     SIMILAR
    0.68
     analogous
    0.67
     Ähn
    0.66
    Act Density 0.527%

    No Known Activations