INDEX
Explanations
specific phrases and expressions related to subjective opinions and assertions
New Auto-Interp
Head Attr Weights
0:0.01
1:0.05
2:0.22
3:0.14
4:0.01
5:0.08
6:0.07
7:0.07
8:0.11
9:0.08
10:0.06
11:0.04
Negative Logits
Trials
-1.15
ombat
-1.05
slaught
-1.00
WARRANT
-0.98
©
-0.93
Challenge
-0.91
Result
-0.90
earable
-0.87
Kelvin
-0.87
Rampage
-0.86
POSITIVE LOGITS
UCT
1.06
pupp
1.04
alam
1.02
hematic
0.99
rael
0.98
here
0.96
nob
0.94
agog
0.92
esides
0.91
helicop
0.90
Activations Density 0.122%