INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    -vers
    -0.31
    cop
    -0.30
    zem
    -0.29
    oud
    -0.27
     Vers
    -0.27
    æĪIJ份
    -0.26
    æĪIJåĪĨ
    -0.26
     maxi
    -0.26
    ophobia
    -0.25
     Maver
    -0.25
    POSITIVE LOGITS
    lib
    0.28
     mistake
    0.26
     Lib
    0.26
    self
    0.25
     bowl
    0.24
     lib
    0.24
     self
    0.24
     barred
    0.24
     tip
    0.24
    binations
    0.24
    Act Density 0.012%

    No Known Activations

    This feature has no known activations.