INDEX
Explanations
words related to intentionality or purposeful action
words and phrases indicating intentionality or deliberate actions
New Auto-Interp
Negative Logits
addons
-0.85
soon
-0.79
Parables
-0.75
esc
-0.72
Warrant
-0.71
Citation
-0.69
anon
-0.69
norm
-0.69
rose
-0.68
ĸļ
-0.68
POSITIVE LOGITS
misleading
0.87
misrepresent
0.84
mislead
0.80
provoking
0.80
provocative
0.79
misled
0.78
sabot
0.78
obfusc
0.78
dece
0.75
sabotage
0.75
Activations Density 0.026%