INDEX
Explanations
specific names and titles of locations, organizations, and events
Capitalized abbreviations and names
Category names and proper nouns
New Auto-Interp
Negative Logits
’,
-0.59
?”.
-0.58
?”,
-0.57
’).
-0.57
?")
-0.56
),”
-0.56
?',
-0.56
=").
-0.55
?’
-0.55
addCriterion
-0.54
POSITIVE LOGITS
<eos>
1.34
https
1.10
↵↵↵
1.05
↵↵↵↵
1.01
↵↵↵↵↵
1.00
http
0.99
↵↵↵↵↵↵↵
0.97
https
0.95
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.93
↵↵
0.93
Activations Density 1.015%