INDEX
Explanations
phrases enclosed in quotation marks
phrases denoting quotes or citations
New Auto-Interp
Negative Logits
thur
-0.72
ife
-0.69
²¾
-0.65
forth
-0.64
ason
-0.64
worldly
-0.61
Picture
-0.60
mber
-0.60
imates
-0.59
ãĥ¼ãĥĨãĤ£
-0.58
POSITIVE LOGITS
/"
0.96
[
0.74
referring
0.72
meaning
0.68
([
0.67
advertisement
0.66
SPONSORED
0.65
ymes
0.65
labeling
0.63
>>\
0.62
Activations Density 0.108%