INDEX
Explanations
specific words related to academic research, such as "thesis" and "dissertation."
instances of the words "thesis" and "dissertation."
New Auto-Interp
Negative Logits
icz
-0.74
obby
-0.72
nels
-0.72
gm
-0.71
theless
-0.71
atility
-0.69
tering
-0.66
bies
-0.65
tn
-0.65
outube
-0.63
POSITIVE LOGITS
ertation
1.28
thesis
1.16
uates
0.98
dissertation
0.90
ually
0.87
pai
0.83
iary
0.79
doc
0.76
endish
0.73
doctoral
0.70
Activations Density 0.010%