INDEX
Explanations
family relationships and emotional statements in news articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1843
+0.12
0.3%
946
+0.10
0.3%
1539
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
888
+0.12
0.05
1919
+0.10
0.05
946
+0.08
0.04
Negative Logits
soggior
-0.95
archivio
-0.84
bandung
-0.84
maroc
-0.82
cammin
-0.82
LIRE
-0.80
cioc
-0.79
tenda
-0.79
torba
-0.78
Luglio
-0.78
POSITIVE LOGITS
distraught
0.56
grieving
0.54
understandably
0.53
stepfather
0.50
PropertyChanging
0.50
contacted
0.48
tear
0.48
said
0.48
Před
0.47
grief
0.47
Activations Density 0.316%