INDEX
    Explanations

    phrases emphasizing freedom and rights-related themes

    New Auto-Interp
    Negative Logits
    icom
    -0.16
    achuset
    -0.16
    eral
    -0.15
    amak
    -0.15
    annes
    -0.14
    abant
    -0.14
    ges
    -0.14
    antis
    -0.14
    çĥ¦
    -0.14
    olla
    -0.14
    POSITIVE LOGITS
    odo
    0.19
    y
    0.16
    ALES
    0.14
    à¤Ĥध
    0.14
    ODO
    0.14
     GÃľ
    0.14
    kö
    0.13
    eldorf
    0.13
    ëł
    0.13
    .nz
    0.13
    Act Density 0.088%

    No Known Activations