INDEX
Explanations
mentions of replacement or substitution
instances of the word "replaced"
New Auto-Interp
Negative Logits
Fight
-0.76
hawk
-0.74
eb
-0.73
NG
-0.72
raq
-0.71
Import
-0.71
emi
-0.69
Dream
-0.68
WAY
-0.66
apa
-0.65
POSITIVE LOGITS
replaces
0.91
replaced
0.88
obsolete
0.86
replacing
0.82
replace
0.81
mentation
0.80
ãĥĺ
0.78
destro
0.78
mented
0.78
replacement
0.77
Activations Density 0.012%