INDEX
Explanations
statements and dialogues in the text
New Auto-Interp
Negative Logits
ritional
-0.77
folios
-0.72
equipped
-0.69
=~=~
-0.69
mania
-0.68
cv
-0.67
mental
-0.66
pleting
-0.65
wx
-0.63
ications
-0.62
POSITIVE LOGITS
goodbye
1.40
aloud
1.14
hello
0.98
nothing
0.97
farewell
0.94
bluntly
0.94
sarcast
0.89
unequivocally
0.83
loudly
0.81
"â̦
0.79
Activations Density 0.114%