INDEX
Explanations
words related to rescue and saving
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1562
+0.12
0.4%
596
+0.10
0.3%
1187
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
890
+0.12
0.03
1562
+0.10
0.03
596
+0.10
0.03
Negative Logits
grafías
-0.48
devenus
-0.45
linke
-0.44
uestions
-0.43
tristesse
-0.42
pól
-0.42
fallu
-0.41
bbero
-0.41
Flur
-0.41
ensacola
-0.40
POSITIVE LOGITS
rescue
0.92
rescued
0.90
rescuing
0.89
rescues
0.89
save
0.77
Rescue
0.74
rescue
0.74
Rescue
0.73
saved
0.73
rescu
0.72
Activations Density 0.124%