INDEX
Explanations
words and phrases related to accountability and criticism
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.07
3:0.22
4:0.09
5:0.04
6:0.06
7:0.06
8:0.04
9:0.05
10:0.14
11:0.12
Negative Logits
Tags
-1.55
rams
-1.46
gal
-1.41
hing
-1.29
ind
-1.29
stuff
-1.27
spac
-1.26
Starts
-1.26
nance
-1.25
minimum
-1.22
POSITIVE LOGITS
vati
1.68
Leilan
1.64
TAMADRA
1.40
Kirin
1.38
stormed
1.38
lav
1.37
theless
1.34
Reloaded
1.31
succumbed
1.29
Schwar
1.26
Activations Density 0.021%