INDEX
Explanations
phrases indicating the beginning or initiation of events or actions
the word "what" in various contexts
New Auto-Interp
Negative Logits
fixation
-0.69
fix
-0.63
fixing
-0.61
abiding
-0.60
inserting
-0.59
supporting
-0.59
concentrating
-0.58
running
-0.57
hearing
-0.57
receiving
-0.57
POSITIVE LOGITS
soever
1.08
happens
1.06
happened
1.04
transpired
1.03
amounted
1.02
constitutes
0.96
wikipedia
0.84
Downloadha
0.83
appears
0.82
resembles
0.82
Activations Density 0.080%