INDEX
Explanations
attends to instances of "know" from "know" and "mean" from "mean."
New Auto-Interp
Head Attr Weights
0:0.06
1:0.18
2:0.05
3:0.05
4:0.10
5:0.37
6:0.10
7:0.06
Negative Logits
EconPapers
-0.59
-0.51
Autoritní
-0.51
InjectAttribute
-0.51
IsMutable
-0.50
AccessorTable
-0.50
ⓧ
-0.50
ArrowToggle
-0.50
ModelExpression
-0.48
SourceChecksum
-0.48
POSITIVE LOGITS
N
0.23
a
0.22
No
0.20
new
0.20
a
0.20
Now
0.20
Fl
0.19
Fl
0.19
.
0.19
now
0.19
Activations Density 0.005%