INDEX
Explanations
phrases related to specific dates, names, locations, and historical events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.32
1.9%
1984
+0.14
0.9%
1896
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1984
+0.32
0.19
1034
+0.14
0.14
1896
+0.12
0.12
Negative Logits
<bos>
-3.47
ⓧ
-0.76
découvre
-0.70
///**
-0.68
préfère
-0.64
<?
-0.63
-0.62
//----
-0.61
//---
-0.61
ressemble
-0.61
POSITIVE LOGITS
véhic
1.12
catég
1.04
soulign
1.04
conflic
1.00
considér
0.99
sappi
0.97
marea
0.96
unlaw
0.95
valencia
0.95
délib
0.94
Activations Density 1.178%