INDEX
Explanations
phrases related to dialogue or communication
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1233
+0.11
0.3%
25
+0.10
0.3%
1053
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1296
+0.11
0.03
25
+0.10
0.03
47
+0.10
0.02
Negative Logits
witcher
-0.63
lgbt
-0.63
celtic
-0.59
imprud
-0.58
excru
-0.57
cuck
-0.57
ufo
-0.56
livel
-0.55
madonna
-0.54
Nils
-0.54
POSITIVE LOGITS
dirait
0.72
réuss
0.66
particolar
0.61
brille
0.61
augmenté
0.60
tvguidetime
0.60
conseille
0.59
saurait
0.59
constate
0.59
irmingham
0.59
Activations Density 0.161%