INDEX
    Explanations

    references to different groups of people based on their nationality or ethnicity

    New Auto-Interp
    Negative Logits
     Canaver
    -0.49
     Patreon
    -0.48
     ACTIONS
    -0.47
     reader
    -0.46
     spokesperson
    -0.45
     additionally
    -0.44
     organizers
    -0.44
     organisers
    -0.44
     aback
    -0.43
     archived
    -0.43
    POSITIVE LOGITS
     ..."
    0.69
     â̦"
    0.66
    )."
    0.62
    ";
    0.58
    "))
    0.55
    )</
    0.55
    )",
    0.54
    ").
    0.52
    ");
    0.52
    "—
    0.52
    Act Density 1.566%

    No Known Activations