INDEX
Explanations
terms related to guards and protection
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
411
+0.14
0.5%
596
+0.13
0.5%
1618
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
411
+0.14
0.02
795
+0.13
0.02
1618
+0.13
0.02
Negative Logits
utafitiHapana
-0.62
iscri
-0.61
verhe
-0.56
تضيفلها
-0.52
imparare
-0.51
espri
-0.49
chiedere
-0.48
provare
-0.48
capire
-0.47
fiore
-0.47
POSITIVE LOGITS
guard
1.31
guards
1.22
Guard
1.18
guard
1.16
guarding
1.13
Guard
1.10
Guards
1.10
guards
1.07
GUARD
1.01
Guards
0.98
Activations Density 0.077%