INDEX
    Explanations

    phrases that indicate mechanisms of influence or control within society

    New Auto-Interp
    Negative Logits
    zed
    -0.14
    æľ¬å½ĵ
    -0.14
    èIJ¥ä¸ļ
    -0.13
    .DefaultCellStyle
    -0.13
    à¹Ħว
    -0.13
    erge
    -0.13
    ield
    -0.13
    é«ĺæ¸ħ
    -0.13
    ctr
    -0.13
     damit
    -0.13
    POSITIVE LOGITS
     means
    0.29
     sheer
    0.27
    puts
    0.24
    ought
    0.24
     various
    0.24
    /by
    0.23
    put
    0.20
     direct
    0.20
    ogh
    0.20
    means
    0.20
    Act Density 0.123%

    No Known Activations