INDEX
Explanations
phrases indicating clarification or emphasis
phrases expressing opinions or beliefs
New Auto-Interp
Negative Logits
artney
-0.70
izoph
-0.64
ifice
-0.60
ainer
-0.60
icidal
-0.60
substrate
-0.59
lif
-0.58
irlf
-0.57
spawn
-0.57
orum
-0.56
POSITIVE LOGITS
myself
1.12
xtap
0.77
personally
0.68
congr
0.67
unres
0.67
fortunate
0.66
saw
0.66
gladly
0.63
Ͻ
0.63
poke
0.63
Activations Density 0.470%