INDEX
    Explanations

    model's conversational start

    New Auto-Interp
    Negative Logits
     entail
    0.34
     mz
    0.32
     transgress
    0.32
     aligning
    0.32
    кови
    0.31
     interconnect
    0.31
     convening
    0.30
     aligned
    0.30
     parenteral
    0.30
     mastering
    0.30
    POSITIVE LOGITS
    哈哈
    0.41
    Which
    0.40
    Yep
    0.40
    Haha
    0.39
    哈哈哈
    0.37
     Yep
    0.36
     איך
    0.36
     Friendly
    0.36
     That
    0.35
    esta
    0.35
    Act Density 0.035%

    No Known Activations