INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Liberties
    -0.67
    trak
    -0.66
    WI
    -0.64
    vious
    -0.64
    supp
    -0.63
    nce
    -0.63
    ld
    -0.61
     Popular
    -0.60
    isan
    -0.59
     Nou
    -0.59
    POSITIVE LOGITS
     twins
    1.38
    poons
    0.87
     orphans
    0.81
    omnia
    0.79
    omething
    0.79
    idious
    0.74
    hips
    0.73
    folk
    0.72
    roo
    0.72
    peak
    0.71
    Act Density 0.005%

    No Known Activations