INDEX
    Explanations

    stereotypes and prejudice

    New Auto-Interp
    Negative Logits
     according
    0.41
    Youth
    0.41
     Youth
    0.40
    0.40
    ンプル
    0.39
    蹿
    0.39
    Sat
    0.38
     Histories
    0.37
     beaches
    0.37
     다양
    0.37
    POSITIVE LOGITS
     découvert
    0.47
     croire
    0.44
     découvrir
    0.43
     visiting
    0.42
     visitare
    0.42
     belladone
    0.42
     hvis
    0.41
     choix
    0.40
    setRoi
    0.39
     répar
    0.39
    Act Density 0.000%

    No Known Activations