INDEX
Explanations
the word "weird" and its variations
references to the concept of "weirdness."
New Auto-Interp
Negative Logits
ptive
-0.85
ptives
-0.84
vation
-0.79
adr
-0.77
apers
-0.77
ILA
-0.76
aders
-0.75
HI
-0.74
REL
-0.73
ailable
-0.72
POSITIVE LOGITS
ness
1.05
nesses
0.92
ly
0.91
entimes
0.85
Weird
0.83
weird
0.83
oes
0.79
est
0.79
ishly
0.78
occurrences
0.75
Activations Density 0.014%