INDEX
    Explanations

    evaluative adjectives indicating quality or preference

    New Auto-Interp
    Negative Logits
    552
    -0.16
    dia
    -0.14
    pii
    -0.14
     赤
    -0.14
    beit
    -0.14
    idis
    -0.14
    ANTA
    -0.14
     Pot
    -0.14
    Ậ
    -0.14
    ONGL
    -0.13
    POSITIVE LOGITS
     way
    0.31
     idea
    0.29
     Idea
    0.27
     choice
    0.26
     thing
    0.25
     bet
    0.24
     option
    0.23
     strategy
    0.22
     Way
    0.22
    strategy
    0.21
    Act Density 0.061%

    No Known Activations