INDEX
Explanations
phrases related to entertainment and leisure activities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.07
0.2%
492
+0.06
0.2%
783
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
783
+0.07
0.03
2030
+0.06
0.02
1861
+0.06
0.03
Negative Logits
<bos>
-1.12
/***
-0.83
tegens
-0.58
public
-0.57
void
-0.55
ⓧ
-0.55
//---
-0.55
qiang
-0.53
send
-0.52
runApp
-0.52
POSITIVE LOGITS
entertaining
1.58
entertain
1.29
entertained
1.23
Juf
1.12
fei
1.08
catég
1.06
sii
1.06
maneu
1.06
idr
1.06
fup
1.05
Activations Density 0.259%