INDEX
Explanations
phrases emphasizing a specific point or action
the word "what" in various contexts
New Auto-Interp
Negative Logits
robe
-0.88
ardless
-0.74
lich
-0.67
trop
-0.64
eer
-0.63
say
-0.63
ubs
-0.62
astical
-0.62
ways
-0.61
UNE
-0.60
POSITIVE LOGITS
happens
1.25
happened
1.21
separates
1.03
soever
1.02
transpired
0.96
distinguishes
0.89
happ
0.87
motiv
0.84
Happ
0.83
bothers
0.82
Activations Density 0.060%