INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Schne
    -0.44
     Mun
    -0.43
     masculin
    -0.42
    Steffen
    -0.41
    Mun
    -0.41
    n
    -0.41
     מן
    -0.40
    sen
    -0.40
     Black
    -0.39
    xC
    -0.39
    POSITIVE LOGITS
     diary
    2.16
     Diary
    2.14
    Diary
    1.97
     diaries
    1.76
    diary
    1.73
     Diaries
    1.59
    日记
    0.98
     diário
    0.94
    Diario
    0.94
    日記
    0.90
    Act Density 0.005%

    No Known Activations