INDEX
Explanations
negative sentences or phrases
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.08
3:0.08
4:0.07
5:0.09
6:0.08
7:0.08
8:0.07
9:0.08
10:0.09
11:0.08
Negative Logits
¨
-3.62
[/
-3.26
Avg
-3.09
Newport
-2.96
quickShipAvailable
-2.96
Connecticut
-2.78
Blizz
-2.76
ijn
-2.75
Melt
-2.70
Rhode
-2.70
POSITIVE LOGITS
Org
2.91
merce
2.75
violently
2.73
Fight
2.67
Ur
2.66
violent
2.44
pose
2.41
submission
2.41
aunders
2.41
osate
2.39
Activations Density 0.000%