INDEX
Explanations
phrases related to sequels or follow-up actions
phrases related to follow-up actions or reports
New Auto-Interp
Negative Logits
nerv
-0.70
DRAG
-0.68
Burr
-0.63
Graveyard
-0.63
Cavern
-0.62
bage
-0.62
Powder
-0.61
Shades
-0.60
Nicotine
-0.60
Smoke
-0.59
POSITIVE LOGITS
sized
0.93
credit
0.93
advertisement
0.88
sent
0.87
only
0.87
minded
0.86
through
0.86
turned
0.85
feature
0.85
heavy
0.85
Activations Density 0.088%