INDEX
    Explanations

    references to geographic locations, specifically focusing on names of cities or countries

    references to specific ethnic groups or nationalities

    New Auto-Interp
    Negative Logits
     Edison
    -0.76
    arily
    -0.71
     WARN
    -0.65
    ister
    -0.62
    closed
    -0.62
     Predator
    -0.62
     envelope
    -0.61
    angered
    -0.60
    ODUCT
    -0.60
    sburg
    -0.59
    POSITIVE LOGITS
    istani
    0.98
    lers
    0.97
    bones
    0.92
    ler
    0.90
    istan
    0.87
    ling
    0.84
    oglu
    0.84
    wei
    0.83
    mens
    0.83
    lings
    0.83
    Act Density 0.025%

    No Known Activations