INDEX
    Explanations

    phrases related to race or skin color

    references to people of color

    New Auto-Interp
    Negative Logits
     preparations
    -0.60
     hallucinations
    -0.58
    Reloaded
    -0.58
    Features
    -0.57
     symptoms
    -0.57
     rounds
    -0.57
     veins
    -0.56
     seams
    -0.54
     needles
    -0.54
    thumbnails
    -0.54
    POSITIVE LOGITS
    ortunately
    0.96
     course
    0.85
    icial
    0.80
     whom
    0.75
    pires
    0.74
    course
    0.70
    iciency
    0.67
    idth
    0.66
    sted
    0.66
    ramer
    0.65
    Act Density 0.091%

    No Known Activations