INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    onew
    -0.72
    locked
    -0.68
     honour
    -0.65
    ocracy
    -0.62
    rimination
    -0.62
     Naz
    -0.62
     neutrality
    -0.60
    uga
    -0.59
    freedom
    -0.58
    icides
    -0.58
    POSITIVE LOGITS
    AMI
    0.77
    engers
    0.62
     RTX
    0.62
     Carlson
    0.62
    AS
    0.61
    larg
    0.61
    ~~~~~~~~~~~~~~~~
    0.60
     resemb
    0.59
     Dimensions
    0.59
     Adult
    0.59
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.