INDEX
    Explanations

    a + positive adjectives

    New Auto-Interp
    Negative Logits
     약간
    0.47
     strana
    0.44
     KIND
    0.41
     낮은
    0.40
     কিছু
    0.40
    0.38
     중요
    0.37
     기술
    0.36
    一些
    0.36
    技術
    0.36
    POSITIVE LOGITS
     truly
    0.62
     refreshing
    0.54
     prosperous
    0.53
     stylish
    0.52
     cohesive
    0.52
     harmonious
    0.52
     seamless
    0.51
     thriving
    0.51
     sleek
    0.51
     flawless
    0.51
    Act Density 0.019%

    No Known Activations