INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ertodd
    -0.80
    itching
    -0.77
    uty
    -0.74
    bryce
    -0.72
    @#&
    -0.71
    eno
    -0.71
    vale
    -0.70
    Laughs
    -0.69
    paces
    -0.69
    lehem
    -0.68
    POSITIVE LOGITS
     descendant
    0.71
    bour
    0.71
     Aus
    0.67
     labelled
    0.67
     Indust
    0.66
     Bulls
    0.64
     BW
    0.63
     descendants
    0.63
     imitation
    0.63
     Bravo
    0.61
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.