INDEX
Explanations
mentions of academic or professional titles and affiliations
sentences or periods that mark the end of statements
New Auto-Interp
Negative Logits
glim
-0.90
metic
-0.86
defe
-0.84
quir
-0.84
prey
-0.81
rall
-0.76
overlooked
-0.74
explan
-0.72
emotion
-0.72
footing
-0.72
POSITIVE LOGITS
Additionally
1.07
Likewise
1.02
Also
1.01
Similarly
1.01
Meanwhile
1.01
Later
1.01
However
1.01
Along
1.00
Their
1.00
Afterwards
0.95
Activations Density 1.410%