INDEX
Explanations
phrases related to political discussions or diplomatic matters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.14
0.4%
919
+0.08
0.2%
998
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.14
0.08
1450
+0.08
0.03
331
+0.07
0.05
Negative Logits
karton
-1.23
kafe
-1.17
bandung
-1.14
nuoc
-1.13
ananas
-1.12
antik
-1.12
lele
-1.11
quoc
-1.09
sentra
-1.08
silikon
-1.07
POSITIVE LOGITS
merely
0.95
only
0.94
instead
0.93
tdessen
0.92
simply
0.85
however
0.81
Instead
0.80
only
0.79
Instead
0.78
nor
0.76
Activations Density 0.579%