INDEX
Explanations
contrasting relationships or ideas
the word "yet," indicating a contrast or contradiction in various contexts
New Auto-Interp
Negative Logits
ãĤ¿
-0.79
esp
-0.77
ãĥİ
-0.77
ufact
-0.76
ãĤ¼ãĤ¦ãĤ¹
-0.74
ãĤ¦ãĤ¹
-0.73
ãĤ¨ãĥ«
-0.73
ãĤ¡
-0.73
/
-0.72
tein
-0.72
POSITIVE LOGITS
somehow
1.17
despite
1.07
again
0.96
strangely
0.93
another
0.89
nonetheless
0.87
whenever
0.82
somew
0.81
inexpl
0.79
nevertheless
0.78
Activations Density 0.046%