INDEX
Explanations
references to the actor Tom Cruise
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
276
+0.12
0.7%
416
+0.12
0.6%
503
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
276
+0.12
0.01
416
+0.12
0.01
383
+0.11
0.01
Negative Logits
ĥ½
-1.94
rapeut
-1.69
¼
-1.60
death
-1.58
ulls
-1.54
ĻĤ
-1.53
plasia
-1.51
male
-1.50
mes
-1.46
ij
-1.45
POSITIVE LOGITS
ulence
1.50
DOM
1.45
antry
1.45
dale
1.43
esp
1.43
Driver
1.43
als
1.41
ello
1.40
quotes
1.40
blogger
1.39
Activations Density 0.017%