INDEX
    Explanations

    Learn/Study

    New Auto-Interp
    Negative Logits
     Coch
    -0.09
    hoe
    -0.09
    øring
    -0.08
     Osm
    -0.08
     coron
    -0.08
    uity
    -0.08
     Barn
    -0.07
     Aby
    -0.07
    れて
    -0.07
     doj
    -0.07
    POSITIVE LOGITS
    fair
    0.08
    prot
    0.08
     aff
    0.08
    0.08
     पै
    0.07
     fuck
    0.07
    ায়
    0.07
     helium
    0.07
    gj
    0.07
    Fair
    0.07
    Act Density 0.008%

    No Known Activations