INDEX
Explanations
statements or phrases clarifying or emphasizing a point within a text
phrases that express certainty or conclusions
New Auto-Interp
Negative Logits
conflic
-0.82
¥ŀ
-0.76
Tai
-0.70
xtap
-0.69
hemor
-0.67
notor
-0.67
itially
-0.66
cumbers
-0.65
estern
-0.65
phrine
-0.64
POSITIVE LOGITS
goodbye
1.46
hello
1.06
Goodbye
0.90
ieu
0.88
aloud
0.87
sorry
0.83
farewell
0.81
bye
0.78
hi
0.73
ings
0.71
Activations Density 0.048%