INDEX
Explanations
occurrences of the word "but" with a higher activation value than other tokens
the conjunction "but" indicating contrasts or exceptions
New Auto-Interp
Negative Logits
Cycle
-0.68
Excellence
-0.62
naire
-0.55
Improvement
-0.54
Procedures
-0.53
ĺħ
-0.53
ampa
-0.53
stimulus
-0.52
ESH
-0.52
pursu
-0.51
POSITIVE LOGITS
chers
1.39
chery
1.28
tons
1.14
ts
1.04
alas
0.97
ted
0.96
nevertheless
0.92
ters
0.92
nonetheless
0.90
cher
0.90
Activations Density 0.089%