INDEX
    Explanations

    mentions of politicians

    New Auto-Interp
    Negative Logits
    actory
    -0.76
    urious
    -0.72
    ventory
    -0.70
    uran
    -0.69
    lights
    -0.66
     Cancel
    -0.66
    gged
    -0.65
    east
    -0.65
     Condition
    -0.65
    IER
    -0.64
    POSITIVE LOGITS
    clinton
    0.98
    hips
    0.80
     appoint
    0.77
    icians
    0.77
     correctness
    0.75
     impe
    0.71
    hip
    0.70
    woman
    0.69
     bent
    0.69
    jriwal
    0.67
    Act Density 0.029%

    No Known Activations