INDEX
Explanations
phrases containing the word "weird"
instances of the word "weird" and related variations
New Auto-Interp
Negative Logits
ptive
-0.91
ptives
-0.83
owder
-0.82
apers
-0.80
aders
-0.78
ailable
-0.77
ILA
-0.76
vation
-0.74
cussion
-0.73
utherford
-0.73
POSITIVE LOGITS
ness
1.00
nesses
0.95
weird
0.93
ly
0.87
Weird
0.87
entimes
0.86
quirks
0.85
anomalies
0.84
ety
0.82
ishly
0.79
Activations Density 0.007%