INDEX
Explanations
phrases related to comfort and discomfort
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
1.1%
1047
+0.12
0.7%
1805
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1047
+0.18
0.03
1805
+0.12
0.03
892
+0.11
0.03
Negative Logits
<bos>
-3.00
/**
-0.96
-0.92
ⓧ
-0.92
/***
-0.92
<?
-0.81
<?
-0.77
///**
-0.74
/*!
-0.72
#![
-0.72
POSITIVE LOGITS
quoc
1.07
Comfort
1.06
Comfort
1.06
saar
1.05
COMFORT
1.02
kasa
1.02
bandung
1.02
maroc
1.02
jawa
1.01
kela
0.99
Activations Density 0.076%