INDEX
Explanations
phrases with a strong emotional or opinionated tone
New Auto-Interp
Negative Logits
ielding
-0.62
itatively
-0.62
Dayton
-0.60
pockets
-0.58
feet
-0.57
hips
-0.57
Priv
-0.57
Uni
-0.57
present
-0.56
nda
-0.56
POSITIVE LOGITS
chy
1.05
ain
0.97
asca
0.90
iner
0.89
unes
0.89
happened
0.88
seems
0.87
self
0.86
wasn
0.85
begg
0.83
Activations Density 11.707%