INDEX
Explanations
questions and statements that challenge the status quo or express skepticism
New Auto-Interp
Head Attr Weights
0:0.04
1:0.06
2:0.02
3:0.10
4:0.07
5:0.34
6:0.06
7:0.03
8:0.07
9:0.11
10:0.03
11:0.03
Negative Logits
Berm
-2.07
Helic
-1.95
oak
-1.91
crane
-1.91
Misty
-1.89
Windsor
-1.88
Cherokee
-1.88
Cobra
-1.87
Lizard
-1.87
Bermuda
-1.86
POSITIVE LOGITS
meaningless
2.41
anyways
2.37
shouldn
2.32
detriment
2.28
inefficient
2.23
urden
2.23
anyway
2.21
useless
2.18
inevitably
2.14
hemer
2.13
Activations Density 0.004%