INDEX
Explanations
phrases with contractions involving "won't"
phrases suggesting disbelief or denial
New Auto-Interp
Negative Logits
Palest
-0.83
RAD
-0.80
mosaic
-0.74
Gleaming
-0.73
ctors
-0.73
pmwiki
-0.72
anwhile
-0.70
horizont
-0.64
Buyable
-0.63
{:-0.62
POSITIVE LOGITS
¢
0.97
¬
0.95
£
0.94
¼
0.91
½
0.91
¹
0.90
¿
0.90
ı
0.89
ł
0.87
»
0.86
Activations Density 0.167%