INDEX
Explanations
phrases related to social life and interactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1741
+0.12
0.4%
994
+0.10
0.3%
184
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.12
0.03
1531
+0.10
0.05
394
+0.10
0.06
Negative Logits
sergio
-0.94
nicolas
-0.86
roberto
-0.85
alberto
-0.84
javier
-0.82
hcm
-0.79
ricardo
-0.78
utop
-0.78
Departement
-0.77
Ibidem
-0.77
POSITIVE LOGITS
and
0.66
Może
0.62
Jakie
0.61
fficacy
0.60
Vielleicht
0.59
したり
0.59
и
0.59
Bardzo
0.58
bijvoorbeeld
0.58
Și
0.58
Activations Density 0.778%