INDEX
    Explanations

    socially unacceptable

    New Auto-Interp
    Negative Logits
     Notably
    0.57
    Notably
    0.56
     Certainly
    0.51
    確かに
    0.49
    Certainly
    0.48
     certainly
    0.47
     importantly
    0.46
     insgesamt
    0.45
     Importantly
    0.45
     Undoubtedly
    0.44
    POSITIVE LOGITS
     tantamount
    0.82
     indicative
    0.75
     heresy
    0.73
     acceptable
    0.71
     happening
    0.68
     unacceptable
    0.64
     taboo
    0.64
     unthinkable
    0.64
     disrespectful
    0.63
     done
    0.61
    Act Density 0.564%

    No Known Activations