INDEX
Explanations
names and references to identity of individuals
phrases related to names and identity
New Auto-Interp
Negative Logits
issance
-0.81
productive
-0.78
ACTIONS
-0.76
ractical
-0.74
forums
-0.73
atform
-0.73
nw
-0.72
requires
-0.72
Progress
-0.71
stalls
-0.71
POSITIVE LOGITS
redacted
1.14
engraved
1.04
tattoo
0.88
initials
0.87
etched
0.87
blurred
0.87
typo
0.86
surname
0.84
dece
0.84
synonymous
0.82
Activations Density 0.382%