INDEX
Explanations
phrases related to references or quotations in text
possessive forms and related possessive language
New Auto-Interp
Negative Logits
vg
-0.70
Ͻ
-0.68
endor
-0.64
rador
-0.62
amiya
-0.62
clipse
-0.62
ptives
-0.61
Archangel
-0.61
ifier
-0.61
sted
-0.60
POSITIVE LOGITS
own
0.82
penchant
0.73
collective
0.72
fingerprints
0.72
sake
0.66
nai
0.66
ELF
0.66
cause
0.66
birthplace
0.65
guts
0.64
Activations Density 0.036%