INDEX
Explanations
contractions of words, especially with a negative connotation or related to warning
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1984
+0.08
0.2%
1335
+0.08
0.2%
1795
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
208
+0.08
0.04
1795
+0.08
0.03
441
+0.07
0.03
Negative Logits
Traité
-0.72
mef
-0.68
Secrétaire
-0.68
fup
-0.63
sophie
-0.61
Ministre
-0.61
fta
-0.61
wien
-0.60
aen
-0.59
Chapitre
-0.59
POSITIVE LOGITS
expandindo
0.71
<bos>
0.59
كويكب
0.52
stagland
0.51
Belén
0.50
Cáceres
0.49
fortawesome
0.49
withRouter
0.48
itemName
0.48
kyllä
0.47
Activations Density 0.226%