INDEX
Explanations
personal experiences and struggles, especially related to mental health and identity
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1919
+0.10
0.3%
227
+0.09
0.2%
674
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.07
714
+0.09
0.04
524
+0.09
0.04
Negative Logits
alfab
-0.68
katal
-0.68
fosfor
-0.65
makro
-0.63
kard
-0.63
koz
-0.62
atmosfer
-0.61
<<<<<<<<<<<<<<
-0.61
balon
-0.60
Kategor
-0.60
POSITIVE LOGITS
unspeak
1.11
myself
1.07
shenan
1.06
disagre
1.05
maneu
1.05
boop
1.04
Myself
1.01
apprehen
1.01
milf
0.99
indescri
0.98
Activations Density 0.482%