INDEX
Explanations
phrases related to personal anecdotes and experiences
New Auto-Interp
Negative Logits
noon
-0.75
rocket
-0.67
acters
-0.63
differential
-0.63
disabling
-0.62
iencies
-0.60
NAT
-0.60
Measure
-0.59
menstrual
-0.58
atible
-0.56
POSITIVE LOGITS
replied
1.11
exclaimed
1.11
said
1.11
wrote
1.10
joked
1.07
remarked
1.03
laughed
1.02
said
1.00
chuckled
1.00
says
0.99
Activations Density 0.056%