INDEX
Explanations
references related to TV show appearances
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
599
+0.13
0.4%
1533
+0.09
0.3%
198
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.13
0.08
940
+0.09
0.03
981
+0.09
0.07
Negative Logits
$(\%)$
-1.05
maksi
-0.94
Byp
-0.87
osal
-0.87
(%)
-0.85
Expt
-0.85
pollut
-0.84
unlaw
-0.84
dovr
-0.84
disagre
-0.83
POSITIVE LOGITS
<bos>
1.00
hilarious
0.66
sitcom
0.64
comedy
0.64
WebMethod
0.62
humor
0.60
comedic
0.59
Попис
0.58
kloped
0.55
Kanpo
0.53
Activations Density 0.920%