INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    <unused2130>
    1.07
    <unused2126>
    0.97
     `/
    0.95
    <unused99>
    0.95
    <unused2172>
    0.94
    <unused2223>
    0.93
    -​
    0.93
    <unused2206>
    0.92
    <unused2164>
    0.91
    `/
    0.91
    POSITIVE LOGITS
     u
    1.38
     t
    1.33
     tho
    1.25
     n
    1.20
     r
    1.19
     ia
    1.18
     nt
    1.18
     ft
    1.17
     m
    1.17
     ito
    1.10
    Act Density 0.027%

    No Known Activations