INDEX
Explanations
personal pronouns used for self-referencing actions
phrases that contain the word "themselves."
New Auto-Interp
Negative Logits
grade
-0.75
amia
-0.73
order
-0.70
execute
-0.69
ulton
-0.68
aster
-0.67
asia
-0.67
ritz
-0.66
pour
-0.66
pak
-0.65
POSITIVE LOGITS
selves
1.18
tremend
0.98
selves
0.93
self
0.91
exting
0.88
themselves
0.87
conduc
0.86
proport
0.84
exha
0.82
behavi
0.79
Activations Density 0.027%