INDEX
    Explanations

    they talk, think, preach, say

    New Auto-Interp
    Negative Logits
     באופן
    0.79
     perceptual
    0.75
    非常に
    0.71
     également
    0.70
     אך
    0.70
     oneself
    0.68
     tuttavia
    0.68
     매우
    0.68
     vollständig
    0.68
     наиболее
    0.68
    POSITIVE LOGITS
     gonna
    0.88
     whining
    0.88
     brag
    0.85
     complaining
    0.79
     bragging
    0.77
    0.77
     got
    0.76
    တွေ
    0.74
     wanna
    0.74
    ってる
    0.73
    Act Density 0.133%

    No Known Activations