INDEX
Explanations
phrases related to covering costs or hiding information
New Auto-Interp
Negative Logits
recomm
-0.71
rious
-0.66
efficients
-0.65
nir
-0.63
friend
-0.61
onwards
-0.60
rever
-0.59
memory
-0.58
Eng
-0.58
isoft
-0.58
POSITIVE LOGITS
bases
0.94
topic
0.72
gaps
0.71
entirety
0.71
ategories
0.70
phia
0.68
Mellon
0.68
gap
0.67
idential
0.66
eatures
0.65
Activations Density 14.474%