INDEX
Explanations
adverbs or adjectives related to deliberate or intentional actions
words indicating intentionality or deliberate action
New Auto-Interp
Negative Logits
soon
-0.80
addons
-0.74
Tycoon
-0.71
Score
-0.70
busters
-0.69
Citation
-0.68
esc
-0.68
til
-0.67
front
-0.65
rooms
-0.65
POSITIVE LOGITS
misrepresent
0.81
misleading
0.77
reinvent
0.77
ãĤ©
0.77
fals
0.76
misled
0.72
coded
0.72
planted
0.72
omitted
0.71
obfusc
0.71
Activations Density 0.019%