INDEX
Explanations
phrases indicating personal involvement or actions
expressions of personal experience or opinions
New Auto-Interp
Negative Logits
iens
-0.75
Stall
-0.73
Definitions
-0.72
Emin
-0.71
LY
-0.68
xual
-0.67
Clover
-0.66
Sparkle
-0.66
eland
-0.65
Klu
-0.65
POSITIVE LOGITS
identifiable
1.21
benefited
0.89
minded
0.88
ised
0.87
invested
0.84
offended
0.84
insulted
0.80
intervened
0.79
opposed
0.78
acquainted
0.77
Activations Density 0.021%