INDEX
Explanations
phrases related to inappropriate behavior
instances of the word "inappropriate" in various contexts
New Auto-Interp
Negative Logits
ership
-0.88
rix
-0.84
abiding
-0.81
adr
-0.78
ript
-0.78
ingen
-0.77
ynthesis
-0.77
hyde
-0.76
bern
-0.76
uster
-0.75
POSITIVE LOGITS
inappropriate
1.01
improper
0.86
undermin
0.84
inappropriately
0.82
awaken
0.81
interference
0.80
misuse
0.80
interfere
0.79
interfered
0.77
behaviour
0.77
Activations Density 0.011%