INDEX
Explanations
quotes and direct speech
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
1.0%
2019
+0.07
0.4%
871
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
871
+0.18
0.07
1709
+0.07
0.07
545
+0.06
0.07
Negative Logits
<bos>
-1.59
ⓧ
-0.97
<?
-0.96
-0.85
/**
-0.83
/*
-0.78
continue
-0.70
/*!
-0.69
<?
-0.67
/*++
-0.66
POSITIVE LOGITS
maneu
1.63
maroc
1.59
affor
1.59
embodi
1.56
stockholm
1.52
roberto
1.49
ricardo
1.46
accla
1.46
lidl
1.45
jorge
1.45
Activations Density 0.280%