INDEX
Explanations
phrases related to observations or insights
phrases that indicate observation or witnessing events
New Auto-Interp
Negative Logits
nor
-0.77
ranged
-0.76
toe
-0.74
orth
-0.72
raft
-0.68
save
-0.67
anium
-0.66
ãĥĦ
-0.66
phe
-0.66
ê
-0.65
POSITIVE LOGITS
examples
1.16
parallels
1.14
similarities
1.13
firsthand
1.10
glimps
1.09
instances
1.04
hints
1.03
signs
1.02
flashes
1.00
how
0.98
Activations Density 0.142%