INDEX
Explanations
phrases indicating uncertainty or questioning established perceptions
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.27
3:0.13
4:0.10
5:0.04
6:0.11
7:0.03
8:0.04
9:0.09
10:0.07
11:0.03
Negative Logits
Chase
-1.50
Scare
-1.37
Argon
-1.36
Haas
-1.36
Drivers
-1.34
alike
-1.33
Nico
-1.32
Mono
-1.32
Karin
-1.31
Brid
-1.30
POSITIVE LOGITS
"]=>
2.11
sqor
1.98
裏�
1.90
EStream
1.72
manuel
1.67
ゼウス
1.65
inion
1.61
obyl
1.61
factor
1.60
culus
1.58
Activations Density 0.031%