INDEX
Explanations
phrases related to body image and personal struggles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
856
+0.16
0.6%
1978
+0.14
0.5%
764
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2044
+0.16
0.07
764
+0.14
0.04
1056
+0.13
0.05
Negative Logits
ivi
-1.25
utop
-1.22
robus
-1.21
dora
-1.17
palab
-1.17
mef
-1.16
ohr
-1.16
solidar
-1.16
gmbh
-1.15
gend
-1.15
POSITIVE LOGITS
alot
0.70
withal
0.67
definately
0.61
Shakspeare
0.60
poetical
0.60
really
0.59
whither
0.58
şey
0.58
unspeak
0.57
very
0.56
Activations Density 0.771%