INDEX
Explanations
phrases related to doubts and concerns in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.45
3:0.09
4:0.09
5:0.02
6:0.05
7:0.06
8:0.03
9:0.02
10:0.04
11:0.04
Negative Logits
grate
-1.73
accordingly
-1.58
ensor
-1.54
ank
-1.44
reon
-1.43
pecially
-1.41
instructor
-1.37
loader
-1.36
ische
-1.36
cyn
-1.35
POSITIVE LOGITS
votes
1.74
Kinnikuman
1.69
Houses
1.62
doms
1.61
oneliness
1.60
ativity
1.56
isma
1.54
upe
1.53
vity
1.53
=================================
1.52
Activations Density 0.837%