INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    renheit
    -0.77
     Democr
    -0.75
    ãĥĺ
    -0.72
    istor
    -0.70
     inacc
    -0.65
     horizont
    -0.64
    heit
    -0.63
     falsehood
    -0.63
     spherical
    -0.63
    sen
    -0.62
    POSITIVE LOGITS
    emis
    0.72
    ourse
    0.66
    ĺħ
    0.65
    own
    0.65
    forward
    0.63
    APP
    0.62
    ipp
    0.62
    aucus
    0.62
    comings
    0.62
    ighth
    0.61
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.