INDEX
Explanations
phrases including names
references to specific individuals and names
New Auto-Interp
Negative Logits
ional
-1.08
ivity
-0.86
=-=-=-=-=-=-=-=-
-0.77
abad
-0.73
IMAGES
-0.71
ariat
-0.68
icks
-0.68
antly
-0.67
ioned
-0.67
iency
-0.65
POSITIVE LOGITS
Mae
1.08
emonic
0.98
zie
0.89
zza
0.80
ovember
0.78
zn
0.76
hl
0.75
vana
0.73
astern
0.73
pling
0.73
Activations Density 0.007%