INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Anſ
    -1.08
     ſever
    -1.03
     Reſ
    -0.99
     rain
    -0.98
     ſche
    -0.96
     fevere
    -0.94
     pleaſure
    -0.94
     reaſon
    -0.94
     raiſ
    -0.93
     myſelf
    -0.91
    POSITIVE LOGITS
    r
    0.60
    hot
    0.58
    head
    0.58
    off
    0.57
    old
    0.53
    full
    0.53
    che
    0.52
    land
    0.52
    ro
    0.51
    back
    0.51
    Act Density 0.036%

    No Known Activations