INDEX
Explanations
phrases indicating a high frequency or intensity of an experience or evaluation
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.08
3:0.30
4:0.01
5:0.02
6:0.12
7:0.04
8:0.06
9:0.05
10:0.11
11:0.12
Negative Logits
ーテ
-1.50
orer
-1.36
inav
-1.30
aring
-1.28
Chamberlain
-1.28
ouri
-1.25
uer
-1.23
alian
-1.23
enment
-1.22
ONSORED
-1.20
POSITIVE LOGITS
illet
1.32
resume
1.32
resumes
1.30
bart
1.25
Trade
1.20
funer
1.15
atom
1.09
punk
1.07
uple
1.03
---------------
1.03
Activations Density 0.008%