INDEX
Explanations
phrases indicating opinions or evaluations
quotation marks and related discourse markers
New Auto-Interp
Negative Logits
scares
-0.76
Flavoring
-0.72
guiActiveUn
-0.69
summarizes
-0.67
NET
-0.64
veil
-0.64
çīĪ
-0.63
formations
-0.63
accompanies
-0.62
loopholes
-0.62
POSITIVE LOGITS
absolutely
1.04
done
1.00
really
0.99
ready
0.98
likely
0.97
completely
0.96
extremely
0.90
appropriately
0.89
fed
0.89
still
0.87
Activations Density 0.221%