INDEX
    Explanations

    references to societal structures and influences

    New Auto-Interp
    Negative Logits
     Gall
    -0.15
    arend
    -0.15
     cant
    -0.15
     Stra
    -0.15
    adio
    -0.15
     tier
    -0.15
    arga
    -0.14
    itis
    -0.14
     Bened
    -0.14
    iggins
    -0.14
    POSITIVE LOGITS
    جب
    0.15
    idl
    0.15
    307
    0.15
    yx
    0.14
    عب
    0.14
    arness
    0.14
    DRAM
    0.14
    rams
    0.14
    MEDIA
    0.14
    imore
    0.13
    Act Density 0.187%

    No Known Activations