INDEX
Explanations
timestamps in a specific format
New Auto-Interp
Negative Logits
authority
-0.63
arms
-0.62
olis
-0.60
outright
-0.59
extermin
-0.59
Abrams
-0.59
envelope
-0.58
envelop
-0.58
pill
-0.58
underwear
-0.57
POSITIVE LOGITS
00
1.40
30
1.28
59
1.27
05
1.26
06
1.25
04
1.23
09
1.22
08
1.21
07
1.21
58
1.20
Activations Density 0.214%