INDEX
Explanations
phrases suggesting advice during a challenging situation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.20
0.6%
964
+0.11
0.3%
381
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1013
+0.20
0.07
1166
+0.11
0.06
1033
+0.10
0.01
Negative Logits
esigen
-0.66
combusti
-0.64
interse
-0.63
stili
-0.63
Ibidem
-0.62
ideolog
-0.59
peculi
-0.58
brille
-0.58
fornire
-0.57
specifica
-0.57
POSITIVE LOGITS
<bos>
0.84
relaxing
0.68
leisurely
0.66
outdoors
0.65
hobbies
0.64
outings
0.62
Shakspeare
0.61
relax
0.61
family
0.59
relaxation
0.59
Activations Density 0.560%