INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     Feldman
    -0.74
    haar
    -0.70
     detached
    -0.69
     trump
    -0.65
     sway
    -0.65
     voic
    -0.64
     fused
    -0.64
     disg
    -0.63
    charg
    -0.62
    «
    -0.61
    POSITIVE LOGITS
    rast
    0.83
    rol
    0.83
    bernatorial
    0.73
    advertisement
    0.72
    aunted
    0.71
    uristic
    0.71
    rolled
    0.70
    icipated
    0.70
    Reviewer
    0.68
    sov
    0.67
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.