INDEX
    Explanations

    proper names, possibly including surnames

    references to individuals, particularly journalists or public figures

    New Auto-Interp
    Negative Logits
     WARN
    -0.71
    llers
    -0.67
    ously
    -0.67
     Clarkson
    -0.67
     cavity
    -0.65
    lly
    -0.64
    rency
    -0.62
    checks
    -0.60
     balloons
    -0.58
    des
    -0.58
    POSITIVE LOGITS
    ivas
    1.24
    anus
    0.86
    terness
    0.86
    anian
    0.82
    ahu
    0.81
    arov
    0.81
    ques
    0.81
    anos
    0.80
    lov
    0.80
    anas
    0.78
    Act Density 0.015%

    No Known Activations