INDEX
Explanations
mentions of professions or specific roles within various organizations
words related to individuals and groups involved in research or oversight roles
New Auto-Interp
Negative Logits
rection
-0.64
TO
-0.59
corrective
-0.59
kun
-0.58
worthiness
-0.58
Weapon
-0.57
Thumbnail
-0.57
dominates
-0.56
exists
-0.56
SOURCE
-0.55
POSITIVE LOGITS
folk
1.04
who
1.01
hip
0.93
hips
0.88
paces
0.88
ranging
0.85
wishing
0.83
nationwide
0.83
erv
0.81
whose
0.80
Activations Density 0.335%