INDEX
Explanations
verbs or phrases indicating information disclosure or lack thereof
statements related to non-disclosure and specification of information
New Auto-Interp
Negative Logits
joice
-0.63
ngth
-0.61
jam
-0.60
ruction
-0.60
oir
-0.60
ét
-0.59
cosystem
-0.58
ãĤ¨
-0.56
CHA
-0.56
âī
-0.54
POSITIVE LOGITS
specifics
1.33
whether
1.29
nor
1.03
whether
1.02
specific
1.00
why
0.99
particulars
0.99
any
0.92
how
0.91
exact
0.89
Activations Density 0.142%