INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    undo
    -0.76
    ouf
    -0.71
    very
    -0.68
    enty
    -0.66
    any
    -0.66
    elong
    -0.65
    fair
    -0.63
    lore
    -0.63
    oos
    -0.62
    Champ
    -0.61
    POSITIVE LOGITS
     regard
    1.11
     regards
    1.10
    stood
    0.88
    standing
    0.87
     impunity
    0.84
     caveats
    0.82
    ategory
    0.73
     intent
    0.72
     roomm
    0.71
     respect
    0.68
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.