INDEX
Explanations
statements containing the word "not"
negations or instances of non-authorization in statements
New Auto-Interp
Negative Logits
opia
-0.65
aspiration
-0.62
Puzzles
-0.62
supremacy
-0.62
soDeliveryDate
-0.62
generational
-0.60
nightmare
-0.59
Solitaire
-0.59
Conversation
-0.59
Pop
-0.58
POSITIVE LOGITS
disclosed
1.13
immediately
1.11
disclose
1.02
disclosing
0.99
divul
0.96
formally
0.95
ifies
0.93
explicitly
0.89
publicly
0.89
discl
0.89
Activations Density 0.120%