INDEX
Explanations
phrases related to policies or actions in a political or governmental context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
604
+0.10
0.3%
674
+0.07
0.2%
1834
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.04
1497
+0.07
0.02
480
+0.07
0.03
Negative Logits
pama
-0.87
susun
-0.87
lyon
-0.84
fei
-0.81
levis
-0.78
motorola
-0.78
NOO
-0.78
jati
-0.76
nina
-0.76
lein
-0.76
POSITIVE LOGITS
indirectly
0.70
thereby
0.69
implicitly
0.68
inadvertently
0.59
essentially
0.59
effectively
0.57
unwittingly
0.56
unknowingly
0.55
למע
0.49
basically
0.49
Activations Density 0.307%