INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    oldt
    -0.21
    316
    -0.15
    uzzy
    -0.15
    datable
    -0.15
    ÑĩаÑĤ
    -0.14
    -ÑĤаки
    -0.14
    ows
    -0.14
    ning
    -0.14
    .scalablytyped
    -0.14
    .Euler
    -0.14
    POSITIVE LOGITS
    jar
    0.23
     jar
    0.23
     Jar
    0.22
    Jar
    0.21
     cutter
    0.19
    pedia
    0.19
    icients
    0.17
     Dough
    0.17
    ilde
    0.17
     dough
    0.15
    Act Density 0.015%

    No Known Activations