INDEX
Explanations
mentions of legal or controversial matters
punctuation and specific numerical contexts
New Auto-Interp
Negative Logits
ensity
-0.69
iphate
-0.69
ouble
-0.67
mble
-0.67
Explore
-0.67
"}],"
-0.66
rossover
-0.66
rawl
-0.65
icult
-0.65
omever
-0.64
POSITIVE LOGITS
prompting
1.30
thereby
1.29
causing
1.20
triggering
1.16
resulting
1.15
sparking
1.13
forcing
1.12
thus
1.06
ruining
1.05
depri
1.05
Activations Density 0.369%