INDEX
    Explanations

    numerical data or figures within the document

    New Auto-Interp
    Negative Logits
    hammer
    -0.16
    oller
    -0.15
    aktu
    -0.15
    vara
    -0.14
    меÑī
    -0.14
    оÑģоб
    -0.14
     Proud
    -0.14
     wr
    -0.14
     Fen
    -0.14
    bil
    -0.14
    POSITIVE LOGITS
     LENG
    0.16
     Zwe
    0.15
    ABS
    0.15
    rypton
    0.14
    daÅŁ
    0.14
    avaÅŁ
    0.14
    etadata
    0.14
    ÑĻ
    0.14
    vanished
    0.14
    æ¡£
    0.14
    Act Density 0.005%

    No Known Activations