INDEX
    Explanations

    names of historical figures and scientists

    New Auto-Interp
    Negative Logits
    BUG
    -0.14
    allas
    -0.13
    oring
    -0.13
    avid
    -0.13
    CardContent
    -0.13
    subst
    -0.12
    βα
    -0.12
    ç¡
    -0.12
    aling
    -0.12
    êu
    -0.12
    POSITIVE LOGITS
    Ã¶ÄŁ
    0.15
     Jr
    0.15
    psc
    0.15
    gran
    0.14
     Lion
    0.14
    âĢł
    0.13
     Smy
    0.13
    192
    0.12
    iena
    0.12
    mann
    0.12
    Act Density 0.104%

    No Known Activations