INDEX
Explanations
mentions or variations of the name "Horace"
references to horror-related themes or content
New Auto-Interp
Negative Logits
ãģ¦
-0.71
Leilan
-0.68
eer
-0.67
ãģ®éŃĶ
-0.66
appraisal
-0.60
eers
-0.59
supplemental
-0.59
graded
-0.58
vigorously
-0.57
REDACTED
-0.57
POSITIVE LOGITS
izontal
1.63
rible
1.54
izont
1.46
ribly
1.44
izons
1.43
rors
1.34
izon
1.33
ror
1.30
rified
1.17
oscope
1.14
Activations Density 0.024%