INDEX
Explanations
written documents such as letters, memos, articles, or blog posts
references to various types of documents and communications, such as letters and memos
New Auto-Interp
Negative Logits
instead
-0.67
cause
-0.61
outweigh
-0.60
ctrl
-0.60
artifacts
-0.58
injust
-0.58
tics
-0.56
illard
-0.56
despise
-0.56
animate
-0.56
POSITIVE LOGITS
nutshell
0.84
idav
0.74
announcing
0.73
interview
0.69
titled
0.68
HuffPost
0.67
emailed
0.65
fter
0.64
published
0.64
released
0.64
Activations Density 0.120%