INDEX
    Explanations

    equality, Monologue, Scenario, response rewritten

    New Auto-Interp
    Negative Logits
    Ironically
    0.63
     utmost
    0.62
    Essentially
    0.61
     tradiz
    0.60
    Magic
    0.60
    Granted
    0.58
     собственно
    0.57
    उत्तर
    0.56
    ುದು
    0.56
     famously
    0.56
    POSITIVE LOGITS
    ‌ترین
    0.77
    점에서
    0.75
     پرداخت
    0.68
    umento
    0.67
     Than
    0.67
     অর্থাৎ
    0.66
     परन्तु
    0.66
     nhưng
    0.65
    ሆኑ
    0.65
     版本
    0.65
    Act Density 1.132%

    No Known Activations