INDEX
Explanations
phrases mentioning personal connections or actions
the word "personally" and its variations, indicating a focus on personal viewpoints or experiences
New Auto-Interp
Negative Logits
Definitions
-0.74
eland
-0.70
Emin
-0.69
period
-0.66
Ends
-0.66
Surveillance
-0.65
LY
-0.65
Dispatch
-0.65
Stall
-0.64
Movement
-0.63
POSITIVE LOGITS
identifiable
1.13
benefited
0.94
invested
0.90
vou
0.89
apologized
0.87
insulted
0.86
thanked
0.86
offended
0.85
intervened
0.85
ised
0.85
Activations Density 0.017%