INDEX
Explanations
phrases related to intentional or deliberate actions
terms related to intentional deceit or misleading actions
New Auto-Interp
Negative Logits
raph
-0.72
NOW
-0.72
masters
-0.68
soon
-0.67
Warriors
-0.67
yip
-0.66
Liter
-0.66
Warrant
-0.65
ival
-0.65
Legend
-0.65
POSITIVE LOGITS
mislead
1.06
sabot
1.03
misled
1.01
obfusc
1.01
misrepresent
1.00
misleading
0.99
fals
0.98
omitted
0.96
omit
0.95
dece
0.95
Activations Density 0.059%