INDEX
Explanations
quotations or reported speech
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.20
0.6%
453
+0.12
0.4%
674
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
882
+0.20
0.04
1150
+0.12
0.03
981
+0.11
0.04
Negative Logits
swarovski
-1.57
murano
-1.55
impra
-1.49
volunte
-1.45
scrat
-1.43
vespa
-1.43
thermomix
-1.42
affor
-1.42
cabrio
-1.41
embodi
-1.40
POSITIVE LOGITS
.
0.89
;
0.76
。
0.73
said
0.71
said
0.71
.”
0.67
)
0.67
,
0.67
।
0.67
:
0.66
Activations Density 0.066%