INDEX
Explanations
phrases related to negative issues and challenges in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.12
0.6%
136
+0.06
0.3%
347
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
630
+0.12
0.06
1259
+0.06
0.06
1158
+0.05
0.05
Negative Logits
<bos>
-2.13
/**
-1.04
ⓧ
-0.93
<?
-0.91
-0.91
<?
-0.87
/*
-0.84
public
-0.80
#![
-0.79
/***
-0.78
POSITIVE LOGITS
jaya
1.73
aen
1.67
stockholm
1.67
maroc
1.65
bandung
1.65
maneu
1.65
hcm
1.65
dises
1.64
affor
1.64
wien
1.63
Activations Density 0.164%