INDEX
Explanations
phrases where negation is involved
negative contractions suggesting inability or negation
New Auto-Interp
Negative Logits
catentry
-0.81
Offline
-0.78
ewater
-0.72
å
-0.70
orst
-0.69
iosyncr
-0.68
ãĥ¼ãĥĨãĤ£
-0.67
çͰ
-0.64
ocr
-0.64
Ãľ
-0.63
POSITIVE LOGITS
temptation
0.77
anymore
0.64
sight
0.61
amaz
0.61
FN
0.58
sweets
0.57
Bronx
0.56
grinning
0.56
INS
0.56
RAG
0.56
Activations Density 0.522%