INDEX
Explanations
phrases that indicate actions related to testing, comparing, or utilizing services and tools on platforms
before prepositions
specific category names
New Auto-Interp
Negative Logits
所以
-0.89
所以
-0.83
Therefore
-0.79
Hence
-0.78
поэтому
-0.76
لذلك
-0.76
hence
-0.75
Hence
-0.75
therefore
-0.74
dlatego
-0.74
POSITIVE LOGITS
yourself
1.11
nhé
0.97
就行了
0.87
yourselves
0.83
吧
0.82
yourself
0.82
就行
0.81
即可
0.77
吧
0.77
your
0.74
Activations Density 0.607%