INDEX
    Explanations

    abbreviations and symbols used for expressing emphasis or directional relationships

    references to a specific demographic group, particularly focusing on individuals

    New Auto-Interp
    Negative Logits
     giveaways
    -0.73
     swept
    -0.64
     scatter
    -0.63
     wob
    -0.62
     waste
    -0.62
     sweep
    -0.61
     dispers
    -0.61
     OM
    -0.60
    romy
    -0.60
     mammoth
    -0.60
    POSITIVE LOGITS
    ¹
    0.95
    ¬
    0.90
    ¡
    0.88
    ername
    0.84
    who
    0.83
    Ī
    0.83
    ı
    0.81
    ij
    0.80
    ª
    0.78
    Ĵ
    0.78
    Act Density 0.204%

    No Known Activations