INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    insula
    -0.74
    jer
    -0.71
    sed
    -0.70
    Jer
    -0.67
    abol
    -0.67
    pn
    -0.67
    olor
    -0.66
    TOR
    -0.65
    ratulations
    -0.64
    arkin
    -0.64
    POSITIVE LOGITS
     SX
    0.68
     demos
    0.68
    orce
    0.67
     demo
    0.65
     agre
    0.64
    ufact
    0.64
    oti
    0.63
    fell
    0.62
     VG
    0.62
     Wim
    0.61
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.