INDEX
Explanations
emotional and impactful moments or situations in stories and experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.13
0.4%
1899
+0.09
0.2%
964
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
509
+0.13
0.06
1490
+0.09
0.06
1969
+0.08
0.05
Negative Logits
tenda
-0.69
polig
-0.64
palio
-0.63
sonda
-0.63
raya
-0.62
masaj
-0.61
Lucía
-0.58
sucul
-0.57
farmacia
-0.57
beneficia
-0.57
POSITIVE LOGITS
coö
0.63
whenever
0.61
nutella
0.60
Abbé
0.55
realizing
0.55
Reiche
0.54
Gorb
0.54
Rine
0.54
Scherer
0.54
subgoals
0.53
Activations Density 0.375%