INDEX
Explanations
negative reviews or experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1531
+0.13
0.4%
1403
+0.10
0.3%
1445
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1782
+0.13
0.03
1531
+0.10
0.03
792
+0.08
0.02
Negative Logits
RSSSF
-0.69
calabaza
-0.59
antropo
-0.59
Economía
-0.59
aarrggbb
-0.59
persecu
-0.57
Bár
-0.57
señor
-0.56
Dibrom
-0.55
Mitä
-0.55
POSITIVE LOGITS
intersper
0.98
accla
0.90
maneu
0.89
reviews
0.88
raving
0.88
pegasus
0.87
depic
0.86
snoopy
0.85
resear
0.85
contribut
0.85
Activations Density 0.359%