INDEX
Explanations
hypothetical situations and prompts starting with "Imagine"
hypothetical scenarios or imaginative situations
New Auto-Interp
Negative Logits
agendas
-0.68
irm
-0.66
в
-0.65
Quality
-0.65
idelines
-0.65
keep
-0.64
rightfully
-0.64
Nonetheless
-0.62
NOT
-0.61
Always
-0.60
POSITIVE LOGITS
scenario
0.90
situation
0.77
agine
0.77
scenarios
0.73
termin
0.73
hypothetical
0.72
pty
0.72
dystop
0.70
someday
0.67
opian
0.67
Activations Density 0.158%