INDEX
Explanations
common first-person pronouns and phrases indicating personal action or intent
New Auto-Interp
Negative Logits
Jefus
-1.05
itſelf
-1.05
Efq
-1.04
IVEREF
-1.00
StoryboardSegue
-0.98
pleaſure
-0.95
GEBURTSDATUM
-0.94
Shakspeare
-0.94
greateſt
-0.94
fubject
-0.94
POSITIVE LOGITS
*}\
0.65
use
0.60
far
0.53
[])
0.51
a
0.50
carefully
0.49
نس
0.49
using
0.49
).\
0.49
previously
0.48
Activations Density 0.552%