INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ли
    0.80
    ле
    0.71
     νέα
    0.64
    ktiv
    0.64
    ς
    0.64
    nalia
    0.63
    ?
    0.61
    down
    0.59
    ssä
    0.59
    ap
    0.58
    POSITIVE LOGITS
     candies
    0.95
     candy
    0.85
    Candy
    0.84
     Candy
    0.82
    🍬
    0.80
    ۹
    0.74
    🍭
    0.73
    ্পনিক
    0.71
     ドラ
    0.68
    ATIVE
    0.65
    Act Density 0.002%

    No Known Activations