INDEX
Explanations
text enclosed in double quotation marks
New Auto-Interp
Negative Logits
²¾
-0.81
thur
-0.68
ife
-0.68
ãĥ¼ãĥĨãĤ£
-0.68
acas
-0.66
ãĥ¯ãĥ³
-0.66
¬¼
-0.63
ernal
-0.61
worldly
-0.60
anmar
-0.59
POSITIVE LOGITS
/"
0.95
[
0.80
meaning
0.73
referring
0.73
{0.70
SPONSORED
0.69
([
0.66
advertisement
0.66
i
0.64
Encyclopedia
0.62
Activations Density 0.102%