INDEX
Explanations
phrases related to warmth and hospitality
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1350
+0.16
0.6%
347
+0.16
0.6%
1491
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
347
+0.16
0.02
1491
+0.16
0.03
559
+0.14
0.02
Negative Logits
depic
-0.74
encomp
-0.65
volunte
-0.65
lts
-0.64
unden
-0.61
liquido
-0.60
prodi
-0.60
puto
-0.60
bascul
-0.59
?...
-0.59
POSITIVE LOGITS
warm
1.35
Warm
1.28
warm
1.27
Warm
1.25
WARM
1.20
warming
1.20
warmth
1.17
warms
1.15
warming
1.11
WARM
1.09
Activations Density 0.068%