INDEX
Explanations
proper nouns related to individuals and their actions
past tense verbs indicating actions or statements made by individuals
New Auto-Interp
Negative Logits
bery
-0.72
uce
-0.68
eem
-0.68
hack
-0.68
umping
-0.64
jam
-0.63
rush
-0.62
strand
-0.60
Might
-0.60
bush
-0.59
POSITIVE LOGITS
consulted
1.20
spoken
1.12
received
1.09
heard
1.04
listened
1.00
contacted
1.00
been
1.00
begun
0.99
doubts
0.99
reviewed
0.97
Activations Density 0.182%