INDEX
Explanations
mentions of alcohol consumption and behavioral changes related to drinking habits
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1253
+0.14
0.4%
198
+0.13
0.4%
1445
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
919
+0.14
0.04
1025
+0.13
0.06
1253
+0.11
0.03
Negative Logits
encomp
-1.22
swarovski
-1.15
felicity
-1.11
embodi
-1.11
indestru
-1.09
Shakspeare
-1.07
brilli
-1.04
<^
-1.04
intersper
-1.01
reft
-0.99
POSITIVE LOGITS
Pediat
0.62
gubern
0.61
prostitu
0.61
revisor
0.59
nationwide
0.59
lapto
0.58
churras
0.58
Estat
0.58
gymnas
0.55
luka
0.55
Activations Density 0.563%