INDEX
Explanations
mentions of the word "kill" in various contexts
New Auto-Interp
Negative Logits
opause
-0.16
unc
-0.15
ew
-0.14
Mush
-0.14
-up
-0.14
Posts
-0.14
zzo
-0.14
ole
-0.13
ent
-0.13
eto
-0.13
POSITIVE LOGITS
switch
0.21
roys
0.18
AdapterFactory
0.18
switches
0.18
ough
0.17
Switch
0.17
_dependency
0.17
ä»·
0.17
_SWITCH
0.16
ingly
0.16
Activations Density 0.009%