INDEX
Explanations
phrases related to causing controversy or strong reactions
phrases associated with provocation and public outrage
New Auto-Interp
Negative Logits
tags
-0.82
models
-0.77
Rules
-0.75
parts
-0.75
coins
-0.74
arte
-0.74
words
-0.74
rics
-0.72
sheets
-0.71
ãĤ¢ãĥ«
-0.70
POSITIVE LOGITS
lot
1.17
flurry
1.07
rouse
1.05
plethora
1.05
hefty
1.04
barrage
1.02
slew
1.02
lengthy
1.00
resurgence
0.98
reversal
0.97
Activations Density 0.265%