INDEX
Explanations
phrases and structures that indicate subjective opinions and statements of belief
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.07
3:0.17
4:0.28
5:0.04
6:0.04
7:0.07
8:0.06
9:0.04
10:0.05
11:0.07
Negative Logits
Juda
-1.46
Lithuan
-1.45
ゼ
-1.44
Winged
-1.39
Palest
-1.37
黒
-1.37
Incredible
-1.36
Wolves
-1.28
sqor
-1.27
Hels
-1.27
POSITIVE LOGITS
vantage
1.62
orse
1.60
uscript
1.58
nor
1.57
necessarily
1.49
anything
1.43
orney
1.40
ynes
1.36
ential
1.35
ndra
1.34
Activations Density 0.011%