INDEX
Explanations
quotes or dialogue
references to brackets or citations within the text
New Auto-Interp
Negative Logits
Elys
-0.81
nodd
-0.77
therap
-0.74
redu
-0.72
seiz
-0.71
sacrific
-0.69
snipp
-0.68
edIn
-0.67
comprom
-0.66
handlers
-0.65
POSITIVE LOGITS
sic
1.60
?]
1.43
!]
1.29
REDACTED
1.29
emphasis
1.18
insert
1.17
Pg
1.16
:]
1.14
1.12
â̦]
1.12
Activations Density 0.035%