INDEX
    Explanations

    proper nouns, specifically names of people

    proper nouns and terms related to specific individuals and entertainment content

    New Auto-Interp
    Negative Logits
    llor
    -0.85
     Slayer
    -0.68
     Dahl
    -0.68
     archived
    -0.66
    ersen
    -0.66
    UTERS
    -0.66
     Hunts
    -0.66
    owered
    -0.65
    erers
    -0.65
    erences
    -0.64
    POSITIVE LOGITS
    forward
    0.93
    creen
    0.89
    acies
    0.84
    nces
    0.84
    acy
    0.84
    ahime
    0.82
    endiary
    0.72
    Asia
    0.71
    ourgeois
    0.71
    law
    0.71
    Act Density 0.044%

    No Known Activations