INDEX
    Explanations

    offering help and polite conversation

    New Auto-Interp
    Negative Logits
    ==
    0.49
    These
    0.38
    =
    0.38
     meaningless
    0.37
     केंद्रित
    0.37
     wget
    0.37
     downloading
    0.37
     rotting
    0.36
     cardinality
    0.36
     attacks
    0.36
    POSITIVE LOGITS
     conversar
    0.61
     politely
    0.59
     conversación
    0.59
     courteous
    0.56
     membantu
    0.55
     помочь
    0.55
     respectful
    0.55
     cheerfully
    0.54
     আন্তরিক
    0.53
     help
    0.53
    Act Density 1.877%

    No Known Activations