INDEX
Explanations
mentions of forces, troops, and conflicts in different contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.11
0.4%
74
+0.04
0.2%
61
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
821
+0.11
0.04
74
+0.04
0.04
624
+0.04
0.04
Negative Logits
<bos>
-1.71
-0.78
/***
-0.74
<?
-0.71
/**
-0.69
ⓧ
-0.68
<?
-0.68
public
-0.67
//---
-0.66
abolish
-0.59
POSITIVE LOGITS
forces
1.87
Forces
1.74
Forces
1.72
forces
1.56
FORCES
1.45
Minang
1.43
force
1.39
hcm
1.29
nuoc
1.26
Force
1.22
Activations Density 0.104%