INDEX
Explanations
references to professors or experts by using the title "Prof"
references to professors
New Auto-Interp
Negative Logits
MENTS
-0.82
leash
-0.77
cruc
-0.72
MENT
-0.69
RANT
-0.69
wolves
-0.68
awoken
-0.68
ashore
-0.68
doors
-0.67
wolf
-0.65
POSITIVE LOGITS
essors
1.58
essor
1.31
iciency
1.25
iles
1.18
icient
1.13
ession
1.09
ound
1.05
ository
1.00
essions
0.96
inctions
0.95
Activations Density 0.005%