INDEX
    Explanations

    references to turtles and related words

    New Auto-Interp
    Negative Logits
    nown
    -0.08
    roud
    -0.07
    nya
    -0.07
    ned
    -0.07
    esty
    -0.07
     thú
    -0.07
    _ONCE
    -0.07
    åłĤ
    -0.07
    teenth
    -0.07
    ROC
    -0.07
    POSITIVE LOGITS
    adow
    0.07
    vin
    0.07
    ucker
    0.06
    -shell
    0.06
    ounds
    0.06
    ean
    0.06
    igidBody
    0.06
     swimming
    0.06
    otle
    0.06
    gram
    0.06
    Act Density 0.005%

    No Known Activations