INDEX
    Explanations

    proper nouns related to individuals and places

    New Auto-Interp
    Negative Logits
     Manz
    -0.65
     Sussex
    -0.64
    LESS
    -0.64
     Terrorism
    -0.64
    âĶĢâĶĢâĶĢâĶĢ
    -0.63
     BDS
    -0.62
     tour
    -0.60
     srfAttach
    -0.60
     theater
    -0.59
     Yoga
    -0.59
    POSITIVE LOGITS
    yth
    1.23
    worn
    0.91
    cffff
    0.90
    cale
    0.88
    haw
    0.86
    ulf
    0.85
    IGH
    0.85
    ¼
    0.85
    igh
    0.85
    eah
    0.84
    Act Density 0.005%

    No Known Activations