INDEX
Explanations
references to oversight and control mechanisms
New Auto-Interp
Head Attr Weights
0:0.06
1:0.01
2:0.33
3:0.05
4:0.06
5:0.07
6:0.01
7:0.04
8:0.14
9:0.03
10:0.09
11:0.05
Negative Logits
Xan
-1.13
plete
-1.09
OPS
-1.07
ocre
-1.07
cures
-1.06
Boone
-1.05
behind
-1.05
belie
-1.04
apest
-1.04
NCT
-1.03
POSITIVE LOGITS
guise
1.67
provocation
1.45
ausp
1.39
pretext
1.38
microscope
1.37
龍契士
1.32
Downloadha
1.29
�
1.27
territorial
1.25
士
1.25
Activations Density 0.059%