INDEX
Explanations
sentences that convey ethical or moral implications related to various topics
New Auto-Interp
Head Attr Weights
0:0.07
1:0.02
2:0.17
3:0.27
4:0.05
5:0.03
6:0.04
7:0.03
8:0.06
9:0.08
10:0.08
11:0.05
Negative Logits
—"
-2.23
McCull
-1.76
Ancients
-1.67
Guant
-1.62
Bill
-1.59
Pu
-1.57
Chron
-1.55
Arist
-1.53
Cele
-1.52
Newsp
-1.52
POSITIVE LOGITS
wx
1.93
etheless
1.91
"]=>
1.84
plugin
1.82
sonian
1.77
ISON
1.76
PDATED
1.76
NOTE
1.74
displayText
1.71
EDIT
1.71
Activations Density 0.057%