INDEX
Explanations
hyphenated words expressing negative attributes or excessiveness
negative phrases or themes related to oversimplification and exaggeration
New Auto-Interp
Negative Logits
ulhu
-0.97
Dickinson
-0.92
Burr
-0.87
Juliet
-0.83
Graveyard
-0.78
Cassidy
-0.78
DeV
-0.74
Boone
-0.74
Blow
-0.74
Vid
-0.72
POSITIVE LOGITS
sized
1.25
regulation
1.11
regulated
1.09
region
1.07
spec
1.06
exc
1.06
emphasis
1.04
matched
1.03
reg
1.03
performing
1.03
Activations Density 0.046%