INDEX
Explanations
references to casinos and gambling
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1178
+0.15
0.5%
421
+0.09
0.3%
1909
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1178
+0.15
0.03
421
+0.09
0.02
1591
+0.07
0.02
Negative Logits
<bos>
-0.80
expel
-0.65
endow
-0.64
/**
-0.60
beforeAll
-0.59
<?
-0.58
/*++
-0.57
<!--
-0.56
neutralize
-0.56
reinstate
-0.55
POSITIVE LOGITS
Casino
2.12
casino
2.02
Casino
1.92
casino
1.77
Casinos
1.76
casinos
1.71
Gambling
1.45
gambling
1.42
Gambling
1.29
gamblers
1.28
Activations Density 0.132%