INDEX
Explanations
phrases related to withholding information or being vague
details or specifics related to sensitive information or restrictions
New Auto-Interp
Negative Logits
ometimes
-0.66
inary
-0.63
aldi
-0.63
itone
-0.62
Tes
-0.61
unct
-0.61
joice
-0.60
ructose
-0.60
Tube
-0.59
ilege
-0.58
POSITIVE LOGITS
specifics
1.06
details
0.99
definitively
0.92
divul
0.87
displayText
0.85
redacted
0.83
confirming
0.83
whereabouts
0.82
definitive
0.81
explan
0.79
Activations Density 0.702%