INDEX
Explanations
instances of the word "a" across the text
New Auto-Interp
Negative Logits
aura
-0.77
alties
-0.73
lements
-0.71
isson
-0.69
upon
-0.68
olicy
-0.68
Links
-0.68
alion
-0.68
alias
-0.68
orders
-0.67
POSITIVE LOGITS
beginner
1.01
novice
0.95
guy
0.92
politician
0.91
newborn
0.89
typical
0.89
woman
0.88
fledgling
0.86
nutshell
0.86
lot
0.84
Activations Density 0.168%