INDEX
Explanations
comparative expressions and phrases indicating a prioritization of one thing over another
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.13
3:0.05
4:0.14
5:0.03
6:0.04
7:0.29
8:0.09
9:0.03
10:0.04
11:0.07
Negative Logits
redited
-1.81
────────
-1.76
RANT
-1.73
────
-1.64
iard
-1.60
���
-1.56
enthusi
-1.54
accredited
-1.49
ocry
-1.48
ibaba
-1.47
POSITIVE LOGITS
obscurity
1.76
die
1.52
guarding
1.50
eering
1.49
Skydragon
1.47
Posts
1.46
etary
1.45
mere
1.42
SERV
1.42
keeping
1.41
Activations Density 0.003%