INDEX
Explanations
references to professors or academic titles
New Auto-Interp
Negative Logits
MENTS
-0.80
doors
-0.74
Nebula
-0.68
halfway
-0.65
MENT
-0.63
Mali
-0.63
Blazers
-0.63
wolves
-0.62
boat
-0.62
PAL
-0.61
POSITIVE LOGITS
essor
1.70
iciency
1.46
ession
1.40
iles
1.35
essors
1.34
ound
1.30
icient
1.26
essed
1.16
iling
1.15
iled
1.12
Activations Density 0.003%