INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    haar
    -0.90
    mble
    -0.86
    ptoms
    -0.84
    llah
    -0.84
    schild
    -0.84
    gaard
    -0.79
    restricted
    -0.78
    beit
    -0.76
    reau
    -0.74
    burst
    -0.74
    POSITIVE LOGITS
     policy
    1.17
     policies
    1.00
     Policy
    0.84
     Policies
    0.77
    policy
    0.73
     Mayo
    0.69
     Doodle
    0.67
     Barbie
    0.66
     Ike
    0.66
     Ellison
    0.66
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.