INDEX
    Explanations

    offering elaborations or alternatives

    New Auto-Interp
    Negative Logits
     procedures
    0.89
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.86
    ↵↵↵↵↵↵↵↵
    0.85
    ↵↵↵↵↵↵↵
    0.85
    ↵↵↵↵↵↵↵↵↵
    0.85
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.84
    ↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.84
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.84
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.84
    ↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
    0.84
    POSITIVE LOGITS
    Note
    1.56
    Edit
    1.55
    EDIT
    1.39
    Alternatively
    1.36
    Bonus
    1.34
    To
    1.33
    edit
    1.27
    PS
    1.25
    Also
    1.23
    NOTE
    1.20
    Act Density 0.140%

    No Known Activations