INDEX
Explanations
indications of potential problems or anomalies
phrases indicating a problem or something amiss
New Auto-Interp
Negative Logits
praises
-0.75
icio
-0.64
reviews
-0.63
fame
-0.63
vale
-0.62
Rican
-0.62
Documents
-0.60
ulia
-0.60
predecessors
-0.59
cites
-0.59
POSITIVE LOGITS
wrong
1.28
wrong
1.14
terribly
1.02
horribly
1.00
bothering
0.99
happening
0.95
rotten
0.93
Wrong
0.90
missing
0.90
missing
0.88
Activations Density 0.104%