INDEX
    Explanations

    expressions of gratitude

    New Auto-Interp
    Negative Logits
     
    0.59
     animaux
    0.49
     s
    0.46
     dart
    0.46
     nightmares
    0.46
     jinak
    0.45
     irritating
    0.44
     eine
    0.44
     monitor
    0.42
     annoying
    0.42
    POSITIVE LOGITS
     sincer
    0.68
    слен
    0.64
     কৃতজ্ঞ
    0.64
     всех
    0.63
    祝福
    0.63
     तहे
    0.62
     आशीष
    0.61
    すべての
    0.60
     جميع
    0.59
     всіх
    0.59
    Act Density 0.009%

    No Known Activations