INDEX
    Explanations

    words beginning with "Sur"

    New Auto-Interp
    Negative Logits
     itſelf
    -1.04
     myſelf
    -1.00
     occaf
    -0.96
     raiſ
    -0.94
     Efq
    -0.93
     pleaſure
    -0.92
     Monfieur
    -0.91
     neceff
    -0.87
     poffe
    -0.87
     chofe
    -0.87
    POSITIVE LOGITS
     W
    0.48
     with
    0.47
    kal
    0.46
     B
    0.46
     ist
    0.43
     to
    0.42
     L
    0.42
    Transcription
    0.42
     (
    0.41
    t
    0.41
    Act Density 0.021%

    No Known Activations