INDEX
    Explanations

    references to social reform and progressive policies

    New Auto-Interp
    Negative Logits
    anta
    -0.15
    avana
    -0.15
    æµľ
    -0.14
    stats
    -0.14
    ãĤ·ãĥ¼
    -0.14
    arker
    -0.14
    edException
    -0.13
    /stats
    -0.13
    .son
    -0.13
    è£ı
    -0.13
    POSITIVE LOGITS
     universal
    0.22
    universal
    0.19
     Universal
    0.19
    Universal
    0.17
     univers
    0.16
    iversal
    0.16
     Transportation
    0.16
    ewe
    0.16
     UNIVERS
    0.16
     repar
    0.15
    Act Density 0.081%

    No Known Activations