INDEX
Explanations
"Attention is All You Need" paper
New Auto-Interp
Negative Logits
health
0.94
help
0.91
stamps
0.90
helps
0.90
boots
0.90
Helpline
0.88
healthier
0.88
trink
0.87
penn
0.87
cruel
0.87
POSITIVE LOGITS
PhysRev
1.22
arXiv
1.12
論文
1.09
eynman
1.03
مقاله
1.02
JHEP
1.00
]_
1.00
構造
0.99
论文
0.99
seminal
0.99
Activations Density 0.030%