INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.39
    1.33
    ”)
    1.30
    !)
    1.29
    )
    1.23
    !)
    1.22
    ')
    1.22
    ")
    1.19
    ’)
    1.19
    1.11
    POSITIVE LOGITS
    ():
    2.96
     "":
    2.52
    ":
    2.47
    ':
    2.42
     ():
    2.42
    ]:
    2.30
     '':
    2.29
    ):
    2.29
    %:
    2.26
    \":
    2.25
    Act Density 0.319%

    No Known Activations