INDEX
Explanations
mentions or instances of the word "trigger"
New Auto-Interp
Negative Logits
hemat
-0.86
ately
-0.82
esan
-0.82
apolis
-0.79
ensable
-0.77
iance
-0.76
cipled
-0.75
cott
-0.73
ians
-0.73
opathy
-0.73
POSITIVE LOGITS
triggering
1.18
trigger
1.13
triggers
1.12
trigger
1.10
Trigger
0.94
warnings
0.93
triggered
0.92
chnology
0.90
witz
0.89
guiActiveUn
0.79
Activations Density 10.707%