INDEX
    Explanations

    explicit mentions of "specific" things or topics

    phrases indicating specificity in context

    New Auto-Interp
    Negative Logits
    rican
    -0.72
    OWER
    -0.71
    mol
    -0.67
     Feldman
    -0.65
    911
    -0.64
     Springer
    -0.64
     Dinosaur
    -0.62
    http
    -0.61
     Neighbor
    -0.61
     Hilton
    -0.61
    POSITIVE LOGITS
    ities
    1.07
    ally
    1.01
    ivity
    0.92
    itarian
    0.91
    arily
    0.90
    iveness
    0.88
    ivities
    0.88
    ality
    0.83
    iations
    0.83
    atively
    0.82
    Act Density 0.007%

    No Known Activations