INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     XOR
    0.50
     demoral
    0.47
     macroscopic
    0.46
     polymorphism
    0.46
     벡터
    0.46
     সক্র
    0.45
    🧠
    0.45
    🦾
    0.45
     ReLU
    0.44
     Diagnostics
    0.43
    POSITIVE LOGITS
     wedding
    2.27
     Wedding
    2.09
    wedding
    2.09
    Wedding
    2.08
     weddings
    1.99
    婚礼
    1.91
     bridal
    1.81
     Weddings
    1.76
    👰
    1.70
     свадь
    1.66
    Act Density 0.031%

    No Known Activations