INDEX
Explanations
links to images, specifically Twitter image links
punctuation marks, specifically periods
New Auto-Interp
Negative Logits
audits
-0.69
involuntary
-0.66
volunt
-0.64
confidentiality
-0.62
indemn
-0.62
tsun
-0.61
conformity
-0.60
equilibrium
-0.60
interchange
-0.60
disadvant
-0.59
POSITIVE LOGITS
1.66
1.13
imgur
1.08
1.05
twitch
1.02
redd
0.94
youtube
0.93
0.90
blogspot
0.90
github
0.87
Activations Density 0.026%