INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    乾隆
    0.40
    ippi
    0.39
    ண்டும்
    0.38
     KK
    0.37
    ophores
    0.37
     Kraft
    0.35
    PU
    0.35
    platte
    0.35
    └──
    0.35
     pla
    0.35
    POSITIVE LOGITS
    แค่
    0.82
     simply
    0.81
    simply
    0.77
     simplesmente
    0.70
     просто
    0.69
     simplement
    0.67
     semplicemente
    0.64
     Simply
    0.62
    Simply
    0.60
     सिंपली
    0.60
    Act Density 0.004%

    No Known Activations