INDEX
Explanations
references to specific years and numerical data related to events or statistics
New Auto-Interp
Head Attr Weights
0:0.10
1:0.02
2:0.03
3:0.07
4:0.18
5:0.17
6:0.04
7:0.01
8:0.08
9:0.05
10:0.01
11:0.17
Negative Logits
undrum
-1.92
deity
-1.89
torch
-1.79
Assembly
-1.76
Ancient
-1.75
heavenly
-1.72
Flavoring
-1.71
Redditor
-1.70
Blu
-1.70
Chicken
-1.69
POSITIVE LOGITS
meanwhile
2.36
thereafter
2.08
incidents
2.06
surveys
2.03
SPONSORED
2.01
Sessions
1.98
interns
1.96
onwards
1.95
Carroll
1.90
however
1.84
Activations Density 0.002%