INDEX
    Explanations

    Lists of names

    New Auto-Interp
    Negative Logits
    uous
    -0.07
    illus
    -0.07
                                   
    -0.06
     crane
    -0.06
     colonial
    -0.06
     Latino
    -0.06
     assistance
    -0.06
    onas
    -0.06
    ξ
    -0.06
     مث
    -0.06
    POSITIVE LOGITS
     Winners
    0.06
    Aaron
    0.06
     grim
    0.06
    AJ
    0.06
    0.06
     dictionary
    0.06
     Hard
    0.06
     Mein
    0.06
    itesse
    0.06
    0.06
    Act Density 0.054%

    No Known Activations