INDEX
Explanations
the word "pure" or variations like "purity"
New Auto-Interp
Negative Logits
apter
-0.93
è¦ļéĨĴ
-0.84
challeng
-0.80
notor
-0.77
izoph
-0.73
agements
-0.72
prominently
-0.71
apper
-0.70
therap
-0.69
Peninsula
-0.69
POSITIVE LOGITS
bred
1.10
waters
1.05
blood
0.97
pure
0.92
blooded
0.88
ified
0.86
pure
0.81
water
0.81
enstein
0.75
hide
0.74
Activations Density 0.012%