INDEX
    Explanations

    contractions where the apostrophe is missing or replaced by unusual characters

    instances of negation or expressions of inability

    New Auto-Interp
    Negative Logits
     commons
    -0.70
     heroin
    -0.69
     kicker
    -0.68
     polio
    -0.67
     pyramid
    -0.66
     Robin
    -0.65
     black
    -0.64
     Lob
    -0.63
     Wilmington
    -0.63
     Taliban
    -0.62
    POSITIVE LOGITS
    ï¸ı
    1.18
    ¯¯
    1.04
    Ì
    1.01
    âĻ
    1.01
    iversary
    1.01
    ̶
    1.00
    âĪ
    0.95
    âĢ
    0.94
    âĶĢâĶĢâĶĢâĶĢ
    0.94
    âĶĢ
    0.94
    Act Density 0.180%

    No Known Activations