INDEX
    Explanations

    proper nouns related to sports, entertainment, and journalism

    New Auto-Interp
    Negative Logits
    category
    -0.67
    igators
    -0.66
     Newtown
    -0.65
    undreds
    -0.64
    κ
    -0.60
     Seym
    -0.60
    ousands
    -0.59
    ãĥ¼ãĥĨ
    -0.58
    iosyncr
    -0.58
     Mehran
    -0.57
    POSITIVE LOGITS
     loves
    0.96
    's
    0.94
     acknowledges
    0.93
     hates
    0.90
     admits
    0.89
     knows
    0.89
     enjoys
    0.88
     concedes
    0.87
     vs
    0.87
     wrote
    0.86
    Act Density 0.231%

    No Known Activations