INDEX
Explanations
attributions or quotes in text
instances of people being quoted
New Auto-Interp
Negative Logits
estern
-0.84
ĸļ士
-0.79
xtap
-0.75
¥ŀ
-0.74
ntil
-0.72
pleting
-0.68
ucha
-0.66
earthqu
-0.65
\/\/
-0.64
Written
-0.64
POSITIVE LOGITS
goodbye
1.34
hello
0.90
aloud
0.86
Goodbye
0.82
farewell
0.81
mith
0.76
loudly
0.74
sorry
0.73
:]
0.72
bluntly
0.71
Activations Density 0.068%