INDEX
Explanations
phrases related to research studies and academic papers
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.14
0.4%
599
+0.10
0.3%
1317
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1499
+0.14
0.07
767
+0.10
0.02
441
+0.10
0.05
Negative Logits
raccont
-0.81
dicendo
-0.77
vorrei
-0.73
voleva
-0.73
parlano
-0.71
purtroppo
-0.69
trovo
-0.67
voglio
-0.67
braccia
-0.67
sento
-0.67
POSITIVE LOGITS
reluct
0.92
maksi
0.92
intersper
0.83
apprehen
0.83
Varan
0.82
depic
0.80
hcm
0.80
impra
0.79
Juf
0.78
shenan
0.78
Activations Density 0.830%