INDEX
Explanations
phrases that indicate a contrasting or surprising element in a sentence
repetitive phrases that contrast or introduce conditions
New Auto-Interp
Negative Logits
ãĤ¨ãĥ«
-0.78
ãĥİ
-0.78
urated
-0.77
ãĤ¼ãĤ¦ãĤ¹
-0.76
esp
-0.76
tein
-0.75
ãĤ¡
-0.73
ufact
-0.72
ãĤ¦ãĤ¹
-0.72
ãĤ¿
-0.70
POSITIVE LOGITS
somehow
1.19
despite
0.99
strangely
0.96
nonetheless
0.95
again
0.94
another
0.89
nevertheless
0.87
inexpl
0.84
mirac
0.82
somew
0.82
Activations Density 0.039%