INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Houſe
    -0.97
     Diſ
    -0.88
     Majefty
    -0.87
     Conſ
    -0.84
     Reſ
    -0.84
     houſe
    -0.83
     myſelf
    -0.83
    Skocz
    -0.81
     ſche
    -0.81
     pleaſure
    -0.81
    POSITIVE LOGITS
     for
    1.00
     so
    0.57
     frumos
    0.56
     very
    0.54
     davvero
    0.51
     sincerely
    0.51
    !
    0.51
    ért
    0.51
    Thankyou
    0.49
     ancora
    0.49
    Act Density 0.033%

    No Known Activations