INDEX
    Explanations

    positive appraisal and understanding

    New Auto-Interp
    Negative Logits
     Agreed
    0.51
     disagreed
    0.50
    好吧
    0.49
     disagree
    0.46
    的态度
    0.45
     calmed
    0.45
     नाराजगी
    0.44
    ok
    0.44
    Alright
    0.44
     agree
    0.44
    POSITIVE LOGITS
     fascinating
    0.82
     интересный
    0.66
     clever
    0.64
     ingenious
    0.61
     interesting
    0.60
     Interesting
    0.59
    Interesting
    0.59
     Nowadays
    0.57
     интерес
    0.55
     marvelous
    0.54
    Act Density 0.005%

    No Known Activations