INDEX
Explanations
words related to feelings or emotions, with a focus on positive or strong feelings
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.12
0.3%
1013
+0.11
0.3%
2034
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.12
0.06
969
+0.11
0.05
249
+0.08
0.05
Negative Logits
Glej
-0.76
enrique
-0.67
notori
-0.67
Pozri
-0.66
ridu
-0.65
DIOS
-0.64
vanta
-0.63
ALLA
-0.63
persua
-0.63
excelente
-0.63
POSITIVE LOGITS
optik
0.76
öf
0.60
keramik
0.60
kafe
0.58
labd
0.57
kabát
0.57
mikrofon
0.57
kompakt
0.57
ekst
0.56
antik
0.55
Activations Density 0.329%