INDEX
    Explanations

    pleasure honor privilege

    New Auto-Interp
    Negative Logits
    要注意
    0.39
    👎
    0.38
     buttery
    0.38
     mocking
    0.37
    Yep
    0.37
     উঠিল
    0.36
    orestation
    0.36
    してた
    0.36
     hardcore
    0.35
    realistic
    0.35
    POSITIVE LOGITS
     privilege
    2.45
     privil
    2.05
     Privilege
    2.05
     privileged
    2.02
     privileges
    2.00
     privilegio
    2.00
     privile
    1.83
     privilegi
    1.77
    Priv
    1.75
     pleasure
    1.72
    Act Density 0.023%

    No Known Activations