INDEX
Explanations
words related to disguise
instances of the word "gu" in various contexts
New Auto-Interp
Negative Logits
Spectrum
-0.81
ŃĶ
-0.75
hower
-0.73
cycle
-0.68
Parables
-0.65
spect
-0.64
exting
-0.64
eleph
-0.63
Patriot
-0.61
Triumph
-0.61
POSITIVE LOGITS
arant
1.12
idelines
1.07
arding
1.04
inea
0.95
arded
0.93
vernment
0.92
errilla
0.92
cci
0.91
ossip
0.90
atem
0.90
Activations Density 0.008%