INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Reſ
    -1.04
     Anſ
    -0.95
     pleaſure
    -0.94
     houſe
    -0.92
     iſt
    -0.91
     Beſ
    -0.90
     itſelf
    -0.89
     Eſ
    -0.86
     faſt
    -0.86
     diſt
    -0.86
    POSITIVE LOGITS
     von
    3.45
     Von
    3.01
    Von
    2.83
    von
    2.49
     VON
    2.45
    VON
    1.89
     vom
    1.89
    Vom
    1.30
     Vom
    1.26
     davon
    1.19
    Act Density 0.021%

    No Known Activations