INDEX
Explanations
author references in academic writing
references to academic papers, specifically those denoted by "et al."
New Auto-Interp
Negative Logits
FUL
-0.81
OUNT
-0.79
canon
-0.76
ardless
-0.69
esses
-0.68
velength
-0.65
ppo
-0.64
finger
-0.63
@#&
-0.63
IFT
-0.62
POSITIVE LOGITS
seq
1.19
rics
0.92
al
0.87
ween
0.85
hetically
0.81
ree
0.79
iated
0.79
iation
0.78
ching
0.77
sis
0.77
Activations Density 0.015%