INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     moreover
    0.53
     zudem
    0.52
     而且
    0.48
    因而
    0.48
    Moreover
    0.45
     inoltre
    0.44
    それでも
    0.44
     furthermore
    0.43
    또한
    0.43
     dagegen
    0.42
    POSITIVE LOGITS
     👋
    0.85
     glad
    0.78
     Glad
    0.75
     Thanks
    0.75
    glad
    0.73
     Since
    0.71
     Sorry
    0.71
     Firstly
    0.71
     Öncelikle
    0.70
     Allow
    0.70
    Act Density 0.195%

    No Known Activations