INDEX
Explanations
article references within a historical or academic context
brackets or other enclosing symbols
New Auto-Interp
Negative Logits
stalls
-0.71
ateurs
-0.65
thrott
-0.65
nesday
-0.64
simulate
-0.62
stands
-0.62
Opportun
-0.62
sinks
-0.61
Takes
-0.60
proportional
-0.60
POSITIVE LOGITS
note
1.23
...]
1.21
Pg
1.19
â̦]
1.05
Footnote
0.94
reviewed
0.93
?]
0.89
Note
0.89
].
0.88
etc
0.87
Activations Density 0.020%