INDEX
Explanations
recommendations or suggestions being made in text
expressions of recommendations or suggestions
New Auto-Interp
Negative Logits
threat
-0.68
omal
-0.65
exist
-0.65
oise
-0.63
thia
-0.63
nown
-0.62
aptic
-0.62
awks
-0.61
istical
-0.61
hs
-0.60
POSITIVE LOGITS
checking
1.08
avoiding
1.03
contacting
0.93
downloading
0.92
reading
0.90
skipping
0.90
purchasing
0.89
picking
0.88
keeping
0.87
watching
0.87
Activations Density 0.067%