INDEX
Explanations
phrases related to evacuation and keeping oneself safe during dangerous situations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
946
+0.12
0.3%
1539
+0.09
0.3%
509
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
946
+0.12
0.05
509
+0.09
0.05
1539
+0.08
0.03
Negative Logits
parlar
-0.76
domini
-0.58
protoimpl
-0.56
miras
-0.54
flore
-0.51
rosz
-0.51
impon
-0.51
lende
-0.50
patin
-0.50
liev
-0.50
POSITIVE LOGITS
evacuate
0.85
flee
0.82
evacuation
0.82
escape
0.79
evacu
0.77
fleeing
0.70
unharmed
0.70
evacuated
0.67
exits
0.67
exodus
0.65
Activations Density 0.334%