INDEX
Explanations
emoticons denoting positivity or satisfaction
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
896
+0.09
0.3%
650
+0.08
0.3%
82
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1577
+0.09
0.04
1752
+0.08
0.02
896
+0.08
0.03
Negative Logits
speak
-0.57
reduce
-0.57
🤣🤣
-0.57
Hahahahaha
-0.57
spoke
-0.57
Lmfao
-0.55
Lma
-0.55
😩
-0.54
remain
-0.54
fail
-0.54
POSITIVE LOGITS
:)
1.30
Ottobre
1.21
tph
1.20
:)
1.15
Luglio
1.13
sappi
1.10
^_^
1.08
paradiso
1.07
;;)
1.06
:))
1.05
Activations Density 0.247%