INDEX
Explanations
specific words or phrases that are emphasized or stand out in the text
New Auto-Interp
Negative Logits
orge
-0.66
Rated
-0.64
bro
-0.59
Miko
-0.58
AAA
-0.57
gettable
-0.57
etheless
-0.55
essor
-0.54
stood
-0.54
Stage
-0.53
POSITIVE LOGITS
lest
1.34
hoping
1.08
fearing
1.07
avoid
0.95
precaution
0.94
hopes
0.89
because
0.87
attempt
0.86
ensuring
0.83
appease
0.83
Activations Density 0.883%