INDEX
Explanations
phrases that pose questions or express doubts
New Auto-Interp
Negative Logits
quer
-0.68
hack
-0.64
gain
-0.60
quin
-0.59
ument
-0.59
query
-0.57
athon
-0.56
brid
-0.55
Radius
-0.54
iy
-0.54
POSITIVE LOGITS
they
0.85
THEY
0.79
atta
0.68
Filename
0.68
"[
0.63
he
0.62
she
0.61
they
0.60
="/
0.60
ãĢİ
0.60
Activations Density 0.187%