INDEX
    Explanations

    names of famous individuals

    proper nouns, particularly personal names and familial relationships

    New Auto-Interp
    Negative Logits
     LEVEL
    -0.72
    pmwiki
    -0.71
    tracking
    -0.71
     Borderlands
    -0.68
     arbitration
    -0.67
     dystopian
    -0.65
     merit
    -0.65
    CONCLUS
    -0.65
     sampling
    -0.65
     polar
    -0.63
    POSITIVE LOGITS
    mie
    0.98
    hyde
    0.96
     Jr
    0.95
    abeth
    0.91
    andro
    0.88
    nie
    0.88
    mi
    0.87
    ilde
    0.87
    lynn
    0.87
    anne
    0.86
    Act Density 0.193%

    No Known Activations