INDEX
    Explanations

    phrases that indicate actions related to testing, comparing, or utilizing services and tools on platforms

    before prepositions

    specific category names

    New Auto-Interp
    Negative Logits
    所以
    -0.89
     所以
    -0.83
     Therefore
    -0.79
     Hence
    -0.78
     поэтому
    -0.76
     لذلك
    -0.76
     hence
    -0.75
    Hence
    -0.75
     therefore
    -0.74
     dlatego
    -0.74
    POSITIVE LOGITS
     yourself
    1.11
     nhé
    0.97
    就行了
    0.87
     yourselves
    0.83
    0.82
    yourself
    0.82
    就行
    0.81
    即可
    0.77
     吧
    0.77
     your
    0.74
    Act Density 0.607%

    No Known Activations