INDEX
Explanations
power dynamics and betrayals in human relationships
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.15
0.4%
184
+0.14
0.4%
764
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1531
+0.15
0.04
184
+0.14
0.02
175
+0.13
0.05
Negative Logits
increa
-1.62
depic
-1.55
encomp
-1.47
strick
-1.43
emphat
-1.43
disagre
-1.42
purcha
-1.42
inev
-1.42
fta
-1.40
secon
-1.40
POSITIVE LOGITS
다시
0.64
again
0.63
suddenly
0.60
MLLoader
0.59
dann
0.58
then
0.58
weer
0.58
ɵɵ
0.57
wieder
0.56
GraphicsUnit
0.56
Activations Density 0.397%