INDEX
    Explanations

    references to America and American identity

    New Auto-Interp
    Negative Logits
    ged
    -0.07
    iltr
    -0.07
    logen
    -0.07
    ông
    -0.07
     Sexe
    -0.07
    Gil
    -0.07
    inky
    -0.07
     æĻ®
    -0.06
    gil
    -0.06
    ING
    -0.06
    POSITIVE LOGITS
    ward
    0.07
    als
    0.06
    <<<<
    0.06
    flow
    0.06
     bowl
    0.06
    erif
    0.06
    uzzi
    0.06
    imdi
    0.06
    979
    0.06
    лиÑĩ
    0.06
    Act Density 0.001%

    No Known Activations