INDEX
Explanations
references to the United States in political and security contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1145
+0.13
0.5%
30
+0.12
0.4%
1978
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
30
+0.13
0.04
1527
+0.12
0.03
1145
+0.11
0.04
Negative Logits
intptr
-0.66
DebuggerStep
-0.62
ScopeManager
-0.61
referenties
-0.60
Tē
-0.60
hyrchwyd
-0.57
HtmlAttribute
-0.57
<bos>
-0.56
Карьера
-0.56
पया
-0.55
POSITIVE LOGITS
Juf
1.44
desir
1.42
fuf
1.41
inev
1.40
Mlle
1.38
guarante
1.35
unlaw
1.35
excru
1.34
perfet
1.33
increa
1.33
Activations Density 0.075%