INDEX
Explanations
terms related to espionage and spying
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
59
+0.13
0.7%
79
+0.13
0.7%
156
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
59
+0.13
0.01
79
+0.13
0.01
237
+0.12
0.01
Negative Logits
Ļª
-1.71
transgender
-1.60
mourn
-1.56
behalf
-1.53
lonely
-1.48
TRODUCTION
-1.44
Hispanic
-1.42
despair
-1.39
welcome
-1.37
griev
-1.37
POSITIVE LOGITS
bilt
1.92
burg
1.88
holder
1.81
craft
1.73
ium
1.71
bur
1.68
ious
1.66
hole
1.65
ware
1.64
ieux
1.61
Activations Density 0.030%