INDEX
Explanations
references to religion and related concepts
New Auto-Interp
Negative Logits
HAEL
-0.86
Pigs
-0.68
enegger
-0.67
slate
-0.65
Wilde
-0.62
BOX
-0.61
buck
-0.60
schild
-0.60
é¾įå¥ij士
-0.59
BOX
-0.57
POSITIVE LOGITS
igion
1.38
iever
1.32
iance
1.30
iability
1.23
iable
1.19
ativity
1.17
ieving
1.14
atively
1.13
ievers
1.13
atives
1.13
Activations Density 0.007%