INDEX
Explanations
mentions of negative sentiments or controversy, such as displeasure and buzz surrounding a topic
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.23
0.9%
198
+0.10
0.4%
1372
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1372
+0.23
0.07
1808
+0.10
0.06
333
+0.08
0.07
Negative Logits
<bos>
-2.72
ⓧ
-0.77
<?
-0.76
/***
-0.75
-0.73
/*
-0.64
//{
-0.63
/**
-0.58
<?
-0.57
deliver
-0.57
POSITIVE LOGITS
Khart
1.41
Juf
1.34
fortn
1.27
Keny
1.26
unce
1.23
secon
1.22
Muhamma
1.19
Minang
1.18
Sarm
1.18
inext
1.17
Activations Density 1.786%