INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    elon
    -0.15
    abant
    -0.14
    arası
    -0.14
    eron
    -0.14
    -Col
    -0.14
    .synthetic
    -0.14
    ERRU
    -0.14
    æ§
    -0.14
     CACHE
    -0.14
     prest
    -0.14
    POSITIVE LOGITS
     bad
    0.17
    anken
    0.16
     Bad
    0.16
     BAD
    0.15
     Fallen
    0.15
     man
    0.15
    .man
    0.15
     shared
    0.14
     Bair
    0.14
     mann
    0.14
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.