INDEX
    Explanations

    mentions of the public figure Winfrey, particularly in a negative context

    New Auto-Interp
    Negative Logits
    ea
    -0.16
    yll
    -0.16
    een
    -0.16
    east
    -0.16
    INO
    -0.15
    оÑĦ
    -0.15
    ino
    -0.15
     Krish
    -0.14
    venge
    -0.14
    eron
    -0.14
    POSITIVE LOGITS
    ipeg
    0.24
    ograd
    0.19
    -win
    0.19
    nable
    0.18
    row
    0.18
    throp
    0.17
    nesota
    0.17
    now
    0.16
    ERRU
    0.16
    .UltraWin
    0.16
    Act Density 0.025%

    No Known Activations