INDEX
    Explanations

    numerical expressions or patterns

    New Auto-Interp
    Negative Logits
    urette
    -0.17
    laus
    -0.17
    025
    -0.17
    lev
    -0.16
    022
    -0.16
    024
    -0.16
    Ŀ
    -0.15
    624
    -0.15
     Bruce
    -0.14
    Bruce
    -0.14
    POSITIVE LOGITS
    34
    0.42
    35
    0.41
    33
    0.41
    36
    0.39
    37
    0.39
    32
    0.37
    38
    0.36
     Thirty
    0.34
    31
    0.34
    39
    0.33
    Act Density 0.084%

    No Known Activations