INDEX
Explanations
phrases related to activities or events
instances of the verb "had" and its various forms indicating past actions or experiences
New Auto-Interp
Negative Logits
!'
-0.77
!'"
-0.64
?'
-0.63
.'
-0.62
?'"
-0.59
!!!
-0.57
Kills
-0.56
.'"
-0.56
!!
-0.56
Printed
-0.56
POSITIVE LOGITS
"
1.21
"...
1.12
"'
1.11
"â̦
1.10
"[
1.09
misunderstood
1.03
''
1.02
unfairly
0.93
"(
0.92
deserved
0.91
Activations Density 0.380%