INDEX
    Explanations

    words related to various forms of bias

    New Auto-Interp
    Negative Logits
     Ke
    -0.15
     sine
    -0.15
    vo
    -0.14
     SI
    -0.14
    OI
    -0.14
    èĩªæ²»
    -0.13
    _SI
    -0.13
     Bond
    -0.13
     poll
    -0.13
    ów
    -0.13
    POSITIVE LOGITS
    eczy
    0.18
    ogg
    0.18
    keit
    0.16
    kees
    0.15
    istrovstvÃŃ
    0.15
    ayd
    0.15
    nya
    0.15
    readcr
    0.14
    ContextHolder
    0.14
    μι
    0.14
    Act Density 0.006%

    No Known Activations