INDEX
Explanations
phrases expressing denial or negation of statements
New Auto-Interp
Negative Logits
culminated
-0.66
kinson
-0.64
culmin
-0.63
icipated
-0.62
culminating
-0.61
adelphia
-0.60
itary
-0.60
iphany
-0.58
littered
-0.58
ilation
-0.57
POSITIVE LOGITS
Nope
0.74
nor
0.70
大
0.65
èĪ
0.65
Saud
0.64
nor
0.64
çͰ
0.63
SPA
0.63
å¾
0.62
ItemImage
0.61
Activations Density 0.049%