INDEX
Explanations
instances where a user is apologizing for not being knowledgeable about a particular topic
repeated phrases conveying a sense of negation or absence
New Auto-Interp
Negative Logits
uez
-0.64
weap
-0.62
Favor
-0.60
later
-0.59
Houses
-0.58
osion
-0.58
abound
-0.57
interstitial
-0.55
Bind
-0.55
Kills
-0.55
POSITIVE LOGITS
gotten
1.28
been
1.14
gotten
1.08
figured
1.06
been
1.05
slept
1.04
bothered
1.03
forgotten
1.01
mastered
1.00
done
0.99
Activations Density 0.080%