INDEX
Explanations
phrases related to secrets or confidential information
instances of the word "confide" and its variations
New Auto-Interp
Negative Logits
ï¸ı
-0.81
Tur
-0.60
Roof
-0.59
practicable
-0.58
hardest
-0.57
Towards
-0.57
Ô
-0.57
Gamb
-0.56
Hope
-0.55
questioning
-0.55
POSITIVE LOGITS
ederation
1.42
essional
1.36
luence
1.35
etti
1.25
eder
1.24
lag
1.20
ection
1.18
lated
1.14
idences
1.12
lation
1.11
Activations Density 0.019%