INDEX
Explanations
phrases related to warnings or urgent calls to action
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.07
3:0.06
4:0.06
5:0.03
6:0.32
7:0.03
8:0.05
9:0.06
10:0.06
11:0.12
Negative Logits
idon
-1.57
library
-1.55
vice
-1.50
azeera
-1.50
bra
-1.49
deals
-1.49
midt
-1.44
azine
-1.41
riot
-1.38
ource
-1.37
POSITIVE LOGITS
cknow
1.66
ciation
1.59
Cth
1.53
Promotion
1.48
エル
1.48
execute
1.47
サーティワン
1.43
Lenin
1.43
Done
1.42
Kan
1.40
Activations Density 0.000%