INDEX
Explanations
attributions or acknowledgments within a text
references to authorities or citations in discussions
New Auto-Interp
Negative Logits
emate
-0.82
rontal
-0.71
istries
-0.69
Tot
-0.66
neutral
-0.65
chrome
-0.62
blem
-0.61
onite
-0.60
rig
-0.59
obal
-0.58
POSITIVE LOGITS
aptly
0.68
pointed
0.66
eloqu
0.65
optim
0.65
dictates
0.65
lance
0.64
imum
0.62
Thrones
0.62
iHUD
0.61
Sov
0.60
Activations Density 0.171%