INDEX
Explanations
phrases related to personal experience or letters
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.07
3:0.13
4:0.25
5:0.02
6:0.11
7:0.13
8:0.04
9:0.03
10:0.04
11:0.08
Negative Logits
Rated
-2.02
Cosponsors
-1.74
redits
-1.68
reek
-1.45
orrow
-1.39
owered
-1.39
Tech
-1.36
instead
-1.36
quickShipAvailable
-1.35
=================================
-1.33
POSITIVE LOGITS
caveats
1.95
exceptions
1.87
occasional
1.81
ones
1.53
obvious
1.52
limitations
1.50
nuts
1.50
caveat
1.49
fluctuations
1.44
prohibitions
1.40
Activations Density 0.005%