INDEX
Explanations
contractions with an apostrophe
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.16
0.5%
1967
+0.16
0.5%
1253
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
19
+0.16
0.13
478
+0.16
0.11
1533
+0.13
0.07
Negative Logits
actionTypes
-0.68
firebaseConfig
-0.66
nasel
-0.64
wieś
-0.62
classNames
-0.59
polski
-0.58
Fakta
-0.58
Vainqueur
-0.56
userEmail
-0.55
Ofer
-0.55
POSITIVE LOGITS
effe
0.93
inder
0.89
fers
0.86
wein
0.85
slan
0.85
levis
0.85
wien
0.84
waer
0.84
ciga
0.83
torba
0.82
Activations Density 1.080%