INDEX
Explanations
information related to various incidents and news events, potentially focusing on violence and safety concerns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.17
0.5%
1535
+0.16
0.5%
2034
+0.12
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1535
+0.17
0.07
2019
+0.16
0.07
1730
+0.12
0.05
Negative Logits
tupperware
-1.00
ecru
-0.94
camry
-0.92
chrysler
-0.90
embodi
-0.87
Darum
-0.87
unlaw
-0.85
hairc
-0.85
jetta
-0.85
cushi
-0.81
POSITIVE LOGITS
These
0.95
These
0.87
these
0.85
these
0.67
hese
0.61
Theses
0.60
IMPORTED
0.59
Even
0.57
THESE
0.57
DoubleQuotes
0.57
Activations Density 0.422%