INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    Person
    -0.79
    20439
    -0.77
    person
    -0.76
    mal
    -0.73
    people
    -0.73
    Nob
    -0.72
     "$:/
    -0.71
    woman
    -0.70
    hor
    -0.69
     Michaels
    -0.66
    POSITIVE LOGITS
     shut
    0.95
    wagen
    0.79
    icago
    0.77
    oaded
    0.70
    enaries
    0.70
    ornia
    0.69
    ttle
    0.69
    ifax
    0.66
    nr
    0.66
    ysis
    0.65
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.