INDEX
Explanations
the adjective "pure" followed by various different words
instances of the word "pure."
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-0.94
apter
-0.90
izoph
-0.87
Peninsula
-0.82
otos
-0.77
therap
-0.77
agements
-0.74
WATCHED
-0.70
Hilton
-0.69
assies
-0.69
POSITIVE LOGITS
waters
1.01
bred
0.90
blood
0.90
blooded
0.85
pure
0.81
ified
0.79
anus
0.76
water
0.75
pure
0.74
enstein
0.72
Activations Density 0.016%