INDEX
Explanations
pronouns or names followed by verbs indicating a request or desire
references to people and their desires or expectations
New Auto-Interp
Negative Logits
VIDIA
-0.73
everal
-0.66
ibe
-0.65
berra
-0.61
rium
-0.60
ggles
-0.58
guiActiveUn
-0.58
ationally
-0.57
Attempts
-0.57
riad
-0.56
POSITIVE LOGITS
to
1.08
badly
0.89
deported
0.85
gone
0.83
to
0.82
cleaned
0.80
punished
0.78
TO
0.77
spoiled
0.75
eaten
0.74
Activations Density 0.117%