INDEX
Explanations
phrases describing personal or professional actions taken against individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1678
+0.12
0.4%
32
+0.12
0.4%
478
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
32
+0.12
0.10
68
+0.12
0.08
1678
+0.11
0.09
Negative Logits
bordeaux
-0.97
napoli
-0.92
milano
-0.90
lyon
-0.89
fuj
-0.88
écl
-0.87
ibiza
-0.86
oreo
-0.86
thermomix
-0.86
levis
-0.85
POSITIVE LOGITS
been
0.81
become
0.68
been
0.68
had
0.62
BEEN
0.61
come
0.61
gone
0.61
reportedly
0.60
has
0.56
already
0.56
Activations Density 0.453%