INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Мы
    -2.13
    -1.96
    コの
    -1.93
    -1.93
    -1.93
     idée
    -1.92
    俺は
    -1.91
     selben
    -1.90
    faatkan
    -1.88
    -1.88
    POSITIVE LOGITS
    0
    3.25
    3
    2.84
    1
    2.75
    7
    2.75
    2
    2.69
    8
    2.66
    .
    2.61
    4
    2.17
     schönes
    2.11
    9
    2.09
    Act Density 0.049%

    No Known Activations