INDEX
Explanations
phrases involving revelation or uncovering details
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.06
3:0.09
4:0.29
5:0.04
6:0.03
7:0.12
8:0.05
9:0.03
10:0.10
11:0.07
Negative Logits
ourt
-1.67
rave
-1.60
bothered
-1.53
roundup
-1.49
uproar
-1.43
aired
-1.42
antiv
-1.42
approves
-1.41
Davis
-1.39
preseason
-1.38
POSITIVE LOGITS
hidden
1.93
hidden
1.76
itives
1.74
treasures
1.70
stery
1.69
selves
1.66
harmless
1.61
treasure
1.60
ifacts
1.51
fug
1.48
Activations Density 0.003%