INDEX
    Explanations

    references to specific individuals, particularly public figures and their actions

    New Auto-Interp
    Negative Logits
    points
    -0.77
    binary
    -0.69
    cape
    -0.69
    isted
    -0.69
     Franç
    -0.68
    stice
    -0.68
    istance
    -0.68
    opausal
    -0.67
    ãĥĦ
    -0.67
    itarian
    -0.67
    POSITIVE LOGITS
     McCabe
    0.96
    hew
    0.86
    hews
    0.73
    atcher
    0.73
    plot
    0.71
    onduct
    0.70
    shaw
    0.67
    ursed
    0.66
    20439
    0.65
    abouts
    0.64
    Act Density 0.005%

    No Known Activations