INDEX
    Explanations

    questions and concerns about technological implementation and safety

    New Auto-Interp
    Negative Logits
     Efq
    -1.46
     Jefus
    -1.45
     myſelf
    -1.35
     Monfieur
    -1.30
     Eſ
    -1.26
    ſelf
    -1.25
     $_"
    -1.24
     chofe
    -1.22
     Anſ
    -1.22
     Reſ
    -1.21
    POSITIVE LOGITS
    1.11
     …
    1.04
    <eos>
    1.04
    ...
    1.01
     ...
    0.98
    </i>
    0.95
    ….
    0.88
    ....
    0.87
    ……
    0.87
     […]
    0.86
    Act Density 0.127%

    No Known Activations