INDEX
Explanations
positive adjectives describing things as good, great, or beautiful
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
994
+0.14
0.5%
1839
+0.11
0.4%
605
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
994
+0.14
0.04
1839
+0.11
0.04
214
+0.10
0.03
Negative Logits
indestru
-1.04
🤣🤣
-0.99
lavorato
-0.97
ricardo
-0.96
alberto
-0.96
?...
-0.96
sergio
-0.95
jorge
-0.93
scoperto
-0.92
FTFY
-0.91
POSITIVE LOGITS
<bos>
0.97
great
0.94
great
0.86
Great
0.85
Great
0.82
GREAT
0.74
excellent
0.64
GREAT
0.63
fantastic
0.63
good
0.61
Activations Density 0.137%