INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    𝙀
    0.49
    као
    0.48
    igneur
    0.48
    depending
    0.48
    кен
    0.47
    ütfen
    0.46
    0.46
    وامی
    0.45
    зне
    0.45
    0.45
    POSITIVE LOGITS
     Diagn
    0.48
     Tisch
    0.45
     ODE
    0.45
     Narr
    0.45
     Cafe
    0.43
     Bets
    0.43
     therapist
    0.43
     humanities
    0.42
     R
    0.42
     café
    0.42
    Act Density 0.001%

    No Known Activations