INDEX
Explanations
personal possessive pronouns
references to the audience or the reader
New Auto-Interp
Negative Logits
forth
-0.91
nown
-0.76
iors
-0.75
minus
-0.74
Serv
-0.73
laus
-0.72
schild
-0.71
Hort
-0.70
acca
-0.69
icably
-0.68
POSITIVE LOGITS
own
0.78
browser
0.77
firewall
0.74
fingertips
0.73
doorstep
0.70
respective
0.70
ears
0.69
discretion
0.69
subscription
0.69
device
0.68
Activations Density 0.028%