INDEX
Explanations
references to whistleblowers and related terminology
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.05
3:0.06
4:0.05
5:0.03
6:0.39
7:0.08
8:0.04
9:0.05
10:0.11
11:0.05
Negative Logits
States
-1.51
Calculator
-1.44
Monetary
-1.42
ICAN
-1.40
ァ
-1.38
WAY
-1.37
affinity
-1.37
ONSORED
-1.36
Chan
-1.35
uala
-1.35
POSITIVE LOGITS
glers
1.75
sed
1.71
etheus
1.60
nsics
1.43
iren
1.43
irens
1.40
gging
1.38
crack
1.38
killed
1.38
puff
1.37
Activations Density 0.001%