INDEX
    Explanations

    phrases indicating approximate quantities or numbers

    New Auto-Interp
    Negative Logits
    pes
    -0.17
    orsi
    -0.16
    isci
    -0.15
    $self
    -0.15
    hey
    -0.15
    idy
    -0.14
    OOT
    -0.14
    asil
    -0.14
    Ñĩа
    -0.14
    ãĥ³ãĥIJ
    -0.14
    POSITIVE LOGITS
     dozen
    0.18
    ;element
    0.16
    150
    0.14
    600
    0.14
    erto
    0.13
    akk
    0.13
    lier
    0.13
    avel
    0.13
     Spurs
    0.13
    íĥĪ
    0.13
    Act Density 0.049%

    No Known Activations