INDEX
Explanations
confession-related phrases, such as "confessing," "confessed," and "confession."
New Auto-Interp
Negative Logits
hyde
-0.82
VILLE
-0.76
Elves
-0.74
WOOD
-0.71
OHN
-0.71
horizontally
-0.69
tone
-0.68
hunter
-0.67
DAY
-0.67
Bard
-0.67
POSITIVE LOGITS
ention
1.41
icit
1.39
icted
1.39
orted
1.38
ervation
1.31
idential
1.31
osed
1.30
iction
1.30
ortion
1.28
inent
1.26
Activations Density 2.822%