INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Reſ
    -0.97
     juſt
    -0.95
     ſta
    -0.94
     Diſ
    -0.94
     Monfieur
    -0.93
     houſe
    -0.92
     Houſe
    -0.90
     ſche
    -0.90
     faſt
    -0.90
     paſſ
    -0.86
    POSITIVE LOGITS
    août
    0.47
     Bar
    0.47
     (
    0.42
     The
    0.42
     di
    0.41
     Peter
    0.41
    不说
    0.40
     leading
    0.40
     Mark
    0.40
     notable
    0.39
    Act Density 0.112%

    No Known Activations