INDEX
Explanations
corrections or clarifications in articles
references to specific articles or versions of content
New Auto-Interp
Negative Logits
çͰ
-0.80
apes
-0.75
gans
-0.74
okers
-0.73
APD
-0.72
aws
-0.72
ickets
-0.71
loads
-0.70
trump
-0.69
marks
-0.69
POSITIVE LOGITS
particular
1.05
article
0.84
millennium
0.79
century
0.78
newfound
0.77
alleged
0.76
latter
0.76
purported
0.76
week
0.75
month
0.75
Activations Density 0.087%