INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    taxon
    -0.80
     recette
    -0.75
    ulsions
    -0.73
    itars
    -0.72
    得点
    -0.71
     coder
    -0.71
    Suns
    -0.68
     giggles
    -0.67
    TokenNameR
    -0.66
     AKP
    -0.66
    POSITIVE LOGITS
     contour
    3.23
     contours
    3.22
    contour
    2.91
    Contour
    2.66
    contours
    2.64
    Contours
    2.55
     Contour
    2.44
     Cont
    1.90
     iso
    1.89
     isother
    1.70
    Act Density 0.028%

    No Known Activations