INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Efq
    -1.15
    ſelf
    -1.11
     itſelf
    -1.08
     myſelf
    -1.02
    ſelves
    -1.01
     Diſ
    -1.00
     Houſe
    -1.00
     Anſ
    -0.97
     Jefus
    -0.96
     houſe
    -0.96
    POSITIVE LOGITS
     in
    0.64
    ,
    0.55
     (
    0.52
     to
    0.50
    .
    0.48
     for
    0.46
      
    0.46
     by
    0.45
     out
    0.45
     more
    0.45
    Act Density 0.034%

    No Known Activations