INDEX
Explanations
proper nouns or names
occurrences of the name "Wil"
New Auto-Interp
Negative Logits
yrinth
-0.83
Butterfly
-0.81
ItemTracker
-0.80
oral
-0.76
shatter
-0.71
æĸ¹
-0.71
Parables
-0.71
crush
-0.70
âĶģ
-0.69
xual
-0.66
POSITIVE LOGITS
mot
1.11
lem
0.99
fred
0.99
Whe
0.95
ibr
0.92
bur
0.90
cox
0.90
lette
0.90
mington
0.90
ters
0.88
Activations Density 0.015%