INDEX
Explanations
phrases related to writing, opinions, and statements of facts
expressions of complex emotional experiences and annotations related to writing
New Auto-Interp
Negative Logits
)."
-0.82
.""
-0.68
.).
-0.61
)"
-0.59
'."
-0.58
]."
-0.57
Lv
-0.56
.''
-0.55
.'"
-0.55
.",
-0.53
POSITIVE LOGITS
Canaver
0.70
Spoiler
0.65
etheless
0.59
explanations
0.54
Berks
0.52
libertarians
0.51
forgiven
0.51
awfully
0.50
whistleblowers
0.50
Krugman
0.50
Activations Density 2.609%