INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    на
    1.96
    ت
    1.71
    1.67
    ا
    1.62
    时候
    1.62
    𝑡
    1.61
    𝖔
    1.61
    ان
    1.61
    1.57
    о
    1.56
    POSITIVE LOGITS
     solubility
    1.60
    өр
    1.55
     brevity
    1.54
     Temperature
    1.51
     Typography
    1.50
     omitting
    1.50
    oung
    1.47
     gratification
    1.47
     Solubility
    1.45
     temperature
    1.45
    Act Density 0.005%

    No Known Activations