INDEX
    Explanations

    expressions of personal feelings or experiences

    New Auto-Interp
    Negative Logits
     Efq
    -1.75
     Monfieur
    -1.70
     Reſ
    -1.66
     houſe
    -1.64
     Theſe
    -1.63
     Houſe
    -1.60
     Eſ
    -1.59
     itſelf
    -1.56
     ―――――
    -1.55
     Anſ
    -1.55
    POSITIVE LOGITS
     I
    3.12
    I
    1.79
     we
    1.79
     i
    1.61
     We
    1.47
     he
    1.45
     my
    1.34
     я
    1.26
     He
    1.23
     It
    1.21
    Act Density 0.265%

    No Known Activations