INDEX
    Explanations

    terms related to social concepts and interactions

    New Auto-Interp
    Negative Logits
    aler
    -0.16
    inness
    -0.15
    idis
    -0.14
    annies
    -0.14
    aligned
    -0.14
    ulla
    -0.14
    gom
    -0.14
    áÅĻ
    -0.14
    ptune
    -0.14
    bsp
    -0.14
    POSITIVE LOGITS
    izing
    0.28
    ization
    0.28
    ize
    0.25
     distancing
    0.24
    ite
    0.24
     media
    0.24
    ized
    0.23
     justice
    0.23
    ising
    0.22
    -media
    0.21
    Act Density 0.031%

    No Known Activations