INDEX
    Explanations

    questions and reflections on the nature of power, responsibilities, and the importance of speaking up

    New Auto-Interp
    Negative Logits
     poffible
    -0.85
     дописавши
    -0.84
     itſelf
    -0.78
     pleaſure
    -0.78
     ſeveral
    -0.77
     Diſ
    -0.77
     neceff
    -0.76
     purpoſe
    -0.76
     ſever
    -0.76
     myſelf
    -0.75
    POSITIVE LOGITS
     certainly
    0.52
     _$
    0.51
    findpost
    0.48
     also
    0.47
     altrett
    0.46
     likewise
    0.45
    (!__
    0.44
     ditto
    0.44
    MathML
    0.44
    ...(
    0.43
    Act Density 0.689%

    No Known Activations