INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    hester
    -0.84
    cientious
    -0.84
    lins
    -0.83
    uay
    -0.81
    itus
    -0.79
    abilia
    -0.79
    amina
    -0.78
    umption
    -0.78
    idth
    -0.77
    atem
    -0.76
    POSITIVE LOGITS
     cry
    0.69
    qs
    0.69
     weep
    0.69
     Siri
    0.68
     mashed
    0.62
     blinded
    0.60
     snap
    0.59
     Aviv
    0.58
     shout
    0.58
    ksh
    0.57
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.