INDEX
Explanations
phrases related to locker rooms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1068
+0.12
0.4%
1363
+0.11
0.4%
1671
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
976
+0.12
0.02
1363
+0.11
0.02
1573
+0.11
0.01
Negative Logits
philips
-0.83
effe
-0.81
intermitt
-0.78
?...
-0.77
gild
-0.77
eyel
-0.76
fuf
-0.76
scrat
-0.76
helico
-0.75
ugg
-0.74
POSITIVE LOGITS
locker
1.33
lockers
1.00
locker
0.88
Locker
0.83
Locker
0.78
lock
0.57
dressing
0.57
closet
0.54
wardrobe
0.53
زندگی
0.52
Activations Density 0.094%