INDEX
Explanations
references to names and roles associated with authority figures or spokespersons
New Auto-Interp
Negative Logits
ziej
-0.15
ewise
-0.14
ioxide
-0.14
iface
-0.14
ighb
-0.14
gage
-0.13
strstr
-0.13
estate
-0.13
interopRequire
-0.13
ncy
-0.13
POSITIVE LOGITS
told
0.71
tell
0.57
telling
0.53
Tell
0.52
tells
0.51
åijĬè¯ī
0.50
Tell
0.49
tell
0.48
Tells
0.41
.tell
0.35
Activations Density 0.140%