INDEX
Explanations
instances of words being replaced with other words or phrases
phrases indicating replacement or substitution
New Auto-Interp
Negative Logits
icious
-0.91
imentary
-0.72
enture
-0.70
antage
-0.70
emetery
-0.67
LIB
-0.65
Streamer
-0.64
PLIED
-0.63
exceeded
-0.61
zai
-0.61
POSITIVE LOGITS
permanent
0.67
livion
0.67
normal
0.66
obsolete
0.66
Sonny
0.65
rities
0.65
;}
0.64
lifeless
0.63
emort
0.63
nonexistent
0.63
Activations Density 0.245%