INDEX
Explanations
surprising or amazed reactions in text
phrases that express surprise or unexpectedness
New Auto-Interp
Negative Logits
burgh
-0.80
illes
-0.67
bye
-0.66
throats
-0.66
aim
-0.62
ère
-0.61
alach
-0.60
ulton
-0.59
Recommended
-0.59
fid
-0.59
POSITIVE LOGITS
similarities
0.81
how
0.70
similarity
0.69
discrepancies
0.68
fusc
0.67
how
0.65
Conserv
0.64
unanim
0.64
parallels
0.64
surprise
0.64
Activations Density 0.239%