INDEX
Explanations
terms related to mental health and medical conditions, particularly focusing on depression
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1137
+0.13
0.5%
1092
+0.12
0.4%
1026
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1137
+0.13
0.03
1092
+0.12
0.02
1026
+0.12
0.02
Negative Logits
(!__
-0.58
BoxFit
-0.54
ÑOS
-0.54
WriteBarrier
-0.53
Portale
-0.53
ValueStyle
-0.53
Referencie
-0.52
monies
-0.52
ân
-0.51
MainAxisSize
-0.51
POSITIVE LOGITS
depression
1.24
Depression
1.22
Depression
1.16
depressed
1.11
depression
1.11
depressions
1.07
depressive
0.90
depress
0.86
pavillon
0.81
«<
0.81
Activations Density 0.078%