INDEX
Explanations
phrases or sentences enclosed in quotation marks
quotations or quoted speech within the text
New Auto-Interp
Negative Logits
reckoning
-0.75
feas
-0.74
probable
-0.74
flock
-0.72
brim
-0.71
honor
-0.71
possibility
-0.70
rhy
-0.69
halfway
-0.68
toast
-0.68
POSITIVE LOGITS
â̦]
2.03
...]
1.96
english
1.30
REDACTED
1.25
Laughs
1.22
Pg
1.22
?]
1.19
!]
1.17
T
1.11
']
1.11
Activations Density 0.025%