INDEX
Explanations
mentions of specific actions or events that have caused a significant impact or controversy
phrases that express the occurrence of events or actions
New Auto-Interp
Negative Logits
eras
-0.66
Corpus
-0.66
hand
-0.63
territ
-0.62
remember
-0.62
remembers
-0.61
respons
-0.61
oit
-0.61
Pt
-0.60
walking
-0.60
POSITIVE LOGITS
been
1.21
garnered
1.17
been
1.13
yielded
0.96
become
0.96
resulted
0.93
gotten
0.93
lasted
0.92
sparked
0.91
begun
0.90
Activations Density 0.222%