INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     стихо
    0.59
    0.55
     شاعری
    0.51
     शायरी
    0.50
     Reddit
    0.49
     شیع
    0.49
     formulario
    0.48
    🗑
    0.48
     Shayari
    0.47
    🗞
    0.47
    POSITIVE LOGITS
     simulation
    1.37
     simulations
    1.31
    Simulation
    1.20
     simulate
    1.17
    simulation
    1.17
     Simulation
    1.13
     simulates
    1.09
    模拟
    1.05
    仿真
    1.02
     simulating
    1.02
    Act Density 0.110%

    No Known Activations