INDEX
Explanations
instances of the word "trigger" and its variants, indicating reactions or responses to situations
New Auto-Interp
Negative Logits
ä½į
-0.16
ä¹ħ
-0.15
UST
-0.15
ugo
-0.15
/md
-0.15
ake
-0.14
ikt
-0.14
ties
-0.14
ynchronize
-0.14
imony
-0.14
POSITIVE LOGITS
-response
0.18
63
0.17
yen
0.16
ivate
0.16
æĿIJ
0.15
ingly
0.15
les
0.15
363
0.15
znik
0.14
pow
0.14
Activations Density 0.113%