INDEX
Explanations
phrases related to public events or interactions, possibly involving humor or controversy
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.17
0.5%
1150
+0.14
0.4%
1343
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
946
+0.17
0.06
736
+0.14
0.06
1533
+0.10
0.03
Negative Logits
hairc
-1.41
milano
-1.29
cannes
-1.28
istan
-1.26
soigne
-1.21
lele
-1.18
matel
-1.18
rafra
-1.17
canel
-1.17
Meksi
-1.16
POSITIVE LOGITS
Și
0.65
motion
0.59
bcryptjs
0.58
)_/¯
0.56
rspec
0.56
relenting
0.55
Cuánt
0.54
JComboBox
0.54
let
0.53
Cuánto
0.53
Activations Density 0.336%