INDEX
    Explanations

    offering help and asking what to do

    New Auto-Interp
    Negative Logits
     bullshit
    0.49
     emocional
    0.47
    fuck
    0.46
     warnings
    0.46
     fís
    0.45
     fucking
    0.44
     remedial
    0.42
     emotionally
    0.42
    べき
    0.42
     warns
    0.41
    POSITIVE LOGITS
     exciting
    0.70
     😊
    0.66
    楽しい
    0.64
     wonderful
    0.63
    愉快
    0.62
    😊
    0.61
    素敵な
    0.61
     groovy
    0.60
     :)
    0.59
     awesome
    0.58
    Act Density 0.078%

    No Known Activations