INDEX
    Explanations

    "i want" / "i cannot" / "i understand"

    New Auto-Interp
    Negative Logits
    ที่มี
    0.64
    Lots
    0.57
     ఉండే
    0.55
     interaction
    0.53
    Description
    0.53
    interaction
    0.51
    Often
    0.51
    Interaction
    0.51
     vibrancy
    0.50
     비슷
    0.49
    POSITIVE LOGITS
     urge
    0.96
     apologize
    0.93
     understand
    0.87
     sincerely
    0.85
     dares
    0.84
     applaud
    0.80
     presume
    0.79
     sympathize
    0.79
     will
    0.78
     regret
    0.77
    Act Density 0.256%

    No Known Activations