INDEX
Explanations
references to entities or concepts represented by the letter 'Y'
occurrences of the letter 'Y'
New Auto-Interp
Negative Logits
Prelude
-0.68
heads
-0.67
Wonderland
-0.67
arresting
-0.62
Creed
-0.61
coupled
-0.60
Prol
-0.58
Sno
-0.57
prevail
-0.57
Inquis
-0.56
POSITIVE LOGITS
von
1.25
vette
1.19
onge
1.17
ield
1.13
ves
1.13
ousse
1.12
usra
1.06
arb
1.03
orks
1.02
ulia
1.01
Activations Density 0.022%