INDEX
Explanations
phrases related to suspicion or underlying control
New Auto-Interp
Negative Logits
Rating
-0.80
sts
-0.75
oria
-0.72
diplomacy
-0.66
ggles
-0.65
rative
-0.63
staking
-0.63
バ
-0.62
Operation
-0.61
agitation
-0.61
POSITIVE LOGITS
overlooking
0.70
clothed
0.68
mocked
0.67
exploited
0.67
tasked
0.67
flanked
0.66
(%)
0.66
vulnerable
0.65
surrounded
0.64
ignored
0.64
Activations Density 1.704%