INDEX
Explanations
phrases indicating something is wrong or needs attention
phrases indicating a sense of something being wrong or missing
New Auto-Interp
Negative Logits
stood
-0.67
fame
-0.65
resume
-0.63
vale
-0.61
suburb
-0.59
Span
-0.59
Rated
-0.58
Documents
-0.58
careers
-0.57
examples
-0.57
POSITIVE LOGITS
wrong
1.19
wrong
1.00
happening
0.99
missing
0.94
bothering
0.93
rotten
0.92
horribly
0.90
terribly
0.86
missing
0.83
strang
0.79
Activations Density 0.088%