INDEX
Explanations
declarative statements conveying certainty or opinions
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.10
3:0.08
4:0.05
5:0.02
6:0.02
7:0.44
8:0.03
9:0.03
10:0.06
11:0.09
Negative Logits
ebook
-1.63
anza
-1.56
ulia
-1.56
utterstock
-1.52
urat
-1.41
hijacked
-1.36
microscope
-1.35
laund
-1.34
smugglers
-1.33
library
-1.30
POSITIVE LOGITS
Goodbye
1.69
brav
1.55
±
1.49
atoon
1.49
"""
1.49
rity
1.46
goodbye
1.43
Thank
1.33
sincerity
1.32
advis
1.32
Activations Density 0.007%