INDEX
Explanations
phrases indicating comparison or contrast
New Auto-Interp
Negative Logits
.this
-0.15
तम
-0.15
riter
-0.14
ãģªãĤī
-0.14
hence
-0.14
@update
-0.14
therefore
-0.14
trl
-0.13
uien
-0.13
plx
-0.13
POSITIVE LOGITS
although
0.85
although
0.69
Although
0.66
èϽçĦ¶
0.63
while
0.63
Although
0.62
though
0.58
aunque
0.56
èϽ
0.52
While
0.52
Activations Density 0.440%