INDEX
    Explanations

    references to freedom and related rights or protections

    New Auto-Interp
    Negative Logits
    antry
    -0.15
     Kültür
    -0.15
    imen
    -0.15
     ØŃاضر
    -0.15
    #!
    -0.15
    longleftrightarrow
    -0.14
    ester
    -0.14
    urban
    -0.14
    '];?>
    -0.14
    ocket
    -0.13
    POSITIVE LOGITS
     fighters
    0.24
     Fighters
    0.23
     fighter
    0.22
     Fighter
    0.19
    ibold
    0.18
    /lib
    0.18
    fighters
    0.18
    Freedom
    0.17
     Freedom
    0.17
     loving
    0.17
    Act Density 0.018%

    No Known Activations