INDEX
Explanations
phrases indicating hesitation or reluctance
expressions of reluctance or resistance to discuss certain topics
New Auto-Interp
Negative Logits
iple
-0.69
doubtless
-0.64
eele
-0.62
Millennium
-0.60
unparalleled
-0.58
ggles
-0.57
excellent
-0.57
not
-0.57
emerges
-0.56
zzi
-0.56
POSITIVE LOGITS
anymore
1.75
anything
1.21
nor
1.12
any
1.09
anybody
1.00
ANY
0.95
anyone
0.93
unless
0.91
anywhere
0.90
anything
0.88
Activations Density 0.547%