INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    er
    -0.60
    y
    -0.58
    (
    -0.56
    .
    -0.56
    p
    -0.51
     (
    -0.50
    l
    -0.48
    -
    -0.47
    m
    -0.47
    n
    -0.46
    POSITIVE LOGITS
     itſelf
    1.61
     myſelf
    1.58
     Efq
    1.55
     pleaſure
    1.54
     themſelves
    1.52
     Anſ
    1.51
     Houſe
    1.49
     Theſe
    1.49
     Reſ
    1.48
     reaſon
    1.47
    Act Density 0.245%

    No Known Activations