INDEX
Explanations
words related to legal and organizational actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
25
+0.13
0.5%
1870
+0.13
0.5%
1942
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
25
+0.13
0.03
1942
+0.13
0.03
1565
+0.12
0.03
Negative Logits
Cup
-0.43
中学生
-0.43
뀌
-0.43
Mig
-0.42
rim
-0.42
-0.41
-0.41
-0.41
ri
-0.41
地道
-0.41
POSITIVE LOGITS
action
1.14
ACTION
1.08
Actions
1.07
Action
1.06
action
1.04
actions
1.02
affor
1.02
strick
1.00
scrat
0.98
milf
0.97
Activations Density 0.076%