INDEX
Explanations
terms related to social contexts and relationships
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
111
+0.16
0.9%
40
+0.13
0.7%
410
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
40
+0.16
0.09
452
+0.13
0.09
352
+0.12
0.09
Negative Logits
")]
-1.88
induced
-1.63
stratified
-1.54
supplemented
-1.52
uchs
-1.51
sembles
-1.50
*](#
-1.49
detected
-1.46
uded
-1.42
""
-1.41
POSITIVE LOGITS
burgh
1.86
voice
1.84
hurst
1.65
hearted
1.59
bay
1.57
love
1.53
stown
1.50
áĢº
1.44
wick
1.40
Sisters
1.39
Activations Density 0.892%