INDEX
    Explanations

    phrases related to actions or instructions

    symbols or characters that appear repeatedly

    New Auto-Interp
    Negative Logits
     Donna
    -0.73
     Billy
    -0.72
     disse
    -0.71
     dist
    -0.69
     laundry
    -0.68
     unbeliev
    -0.68
     Harley
    -0.68
     DeL
    -0.67
     Miss
    -0.66
     Haj
    -0.66
    POSITIVE LOGITS
    ĺ
    1.77
    ĺħ
    0.98
    right
    0.95
    IJ
    0.94
    ĸ
    0.92
    о
    0.92
    uo
    0.89
    rax
    0.89
    ĥ
    0.89
    Ĺ
    0.88
    Act Density 0.082%

    No Known Activations