INDEX
Explanations
references to physical objects or specific details within a larger context
references to vulnerability and oppression
New Auto-Interp
Negative Logits
amount
-0.54
SEA
-0.51
Prel
-0.49
inding
-0.48
Ori
-0.48
described
-0.48
uned
-0.48
Environment
-0.46
landowners
-0.45
shown
-0.45
POSITIVE LOGITS
anymore
1.12
;)
0.96
?'
0.94
!",
0.93
haha
0.93
:-)
0.92
?",
0.90
!'
0.90
someday
0.89
:)
0.89
Activations Density 1.284%