INDEX
    Explanations

    proper nouns, specifically names of people and their associated attributes

    New Auto-Interp
    Negative Logits
     Putih
    -0.57
     Hitam
    -0.55
     financeira
    -0.51
    parken
    -0.50
     Internasional
    -0.50
     econômica
    -0.49
     Mulher
    -0.48
    hallen
    -0.47
     Kerk
    -0.47
     Ström
    -0.45
    POSITIVE LOGITS
    mann
    0.66
    berger
    0.53
    ke
    0.50
    berg
    0.48
    heck
    0.47
    hardt
    0.47
     seaborn
    0.47
    lma
    0.46
    hold
    0.46
     mann
    0.45
    Act Density 0.419%

    No Known Activations