INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itſelf
    -1.27
     Efq
    -1.16
     Houſe
    -1.10
     Jefus
    -1.05
     Majefty
    -1.03
     ―――――
    -1.02
     Theſe
    -1.02
     pleaſure
    -1.00
     Diſ
    -0.98
     Eſ
    -0.97
    POSITIVE LOGITS
    ↵↵
    0.76
     all
    0.71
    0.70
    0.69
    0.67
    '
    0.66
     “
    0.65
    "
    0.63
    The
    0.62
      
    0.60
    Act Density 0.086%

    No Known Activations