INDEX
    Explanations

    phrases indicating clarification or elaboration on previous statements

    "In other words" or similar rephrasing

    New Auto-Interp
    Negative Logits
     myſelf
    -0.96
     houſe
    -0.95
     Houſe
    -0.95
     itſelf
    -0.95
     Efq
    -0.93
     pleaſure
    -0.91
     ―――――
    -0.90
     Majefty
    -0.89
     ་་
    -0.87
    ſelves
    -0.86
    POSITIVE LOGITS
     they
    1.03
     it
    0.95
     the
    0.94
    :
    0.91
     we
    0.86
    ,
    0.86
     a
    0.85
     “
    0.78
     "
    0.77
     how
    0.75
    Act Density 0.228%

    No Known Activations