INDEX
Explanations
phrases indicating negative or undesirable situations
statements that emphasize negation or denial
New Auto-Interp
Negative Logits
aneers
-0.88
ãĤ¨ãĥ«
-0.86
ngth
-0.84
anamo
-0.75
seys
-0.74
ãĥ¼ãĥ³
-0.74
alties
-0.74
opez
-0.73
ãĤ¦ãĤ¹
-0.73
dies
-0.72
POSITIVE LOGITS
blat
0.77
spac
0.71
continuation
0.71
generational
0.70
happening
0.70
whistlebl
0.69
STEM
0.69
blatant
0.66
chance
0.65
textbook
0.64
Activations Density 0.263%