INDEX
    Explanations

    socially or economically divisive

    New Auto-Interp
    Negative Logits
    ayashi
    0.47
    archical
    0.46
    adoption
    0.45
    conducting
    0.45
    status
    0.42
    bel
    0.42
    ählt
    0.42
     fdPar
    0.42
    𝐦
    0.41
    ząd
    0.41
    POSITIVE LOGITS
     You
    0.50
    0.48
     go
    0.48
     QQ
    0.48
     I
    0.47
     nearly
    0.47
     Want
    0.47
     antidote
    0.46
     Cheer
    0.46
     want
    0.46
    Act Density 0.046%

    No Known Activations