INDEX
Explanations
themes related to consequences, both positive and negative, in various contexts
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.10
3:0.17
4:0.12
5:0.04
6:0.22
7:0.07
8:0.03
9:0.04
10:0.05
11:0.05
Negative Logits
Rated
-1.65
Annotations
-1.62
fleet
-1.56
aido
-1.44
Nanto
-1.41
uyomi
-1.35
psey
-1.34
reply
-1.33
[+
-1.31
Rico
-1.27
POSITIVE LOGITS
bang
1.41
ourselves
1.36
backward
1.34
democrat
1.28
perfection
1.24
essentials
1.22
apples
1.22
peripher
1.21
convenient
1.19
backwards
1.19
Activations Density 0.026%