INDEX
Explanations
actions related to using or interacting with something
references to personal experiences and reactions
New Auto-Interp
Negative Logits
ahime
-0.78
recruitment
-0.70
ENCY
-0.69
immigrant
-0.68
departures
-0.68
roots
-0.68
Attacks
-0.66
treaties
-0.65
abduction
-0.64
kidnapping
-0.64
POSITIVE LOGITS
amazed
1.15
regretted
1.00
amused
1.00
marvel
1.00
thanked
0.99
laughed
0.99
regret
0.99
applause
0.95
loved
0.93
afterward
0.91
Activations Density 0.564%