INDEX
Explanations
titles or headings within text
repeated occurrences of the word "MORE."
New Auto-Interp
Negative Logits
vet
-0.79
aper
-0.79
stood
-0.78
arist
-0.78
riber
-0.76
liest
-0.74
etime
-0.72
reen
-0.71
ibel
-0.71
ee
-0.71
POSITIVE LOGITS
MORE
1.13
HEAD
0.93
than
0.88
ado
0.86
agascar
0.83
FTWARE
0.79
convol
0.76
VIDEOS
0.76
MORE
0.76
ABOUT
0.74
Activations Density 0.005%