INDEX
Explanations
pronouns and linking verbs
New Auto-Interp
Negative Logits
href
-0.61
OWN
-0.61
enburg
-0.60
nown
-0.60
thriving
-0.59
infringing
-0.59
tering
-0.59
payday
-0.58
skim
-0.58
exploits
-0.58
POSITIVE LOGITS
ologists
0.82
icians
0.79
ums
0.76
abases
0.76
essors
0.76
ologist
0.76
INAL
0.74
ference
0.72
atively
0.71
ennial
0.69
Activations Density 0.020%