INDEX
Explanations
phrases that indicate a revelation or realization
the word "that" in various contexts
New Auto-Interp
Negative Logits
aukee
-0.70
raq
-0.63
gur
-0.63
orah
-0.63
waters
-0.63
stead
-0.62
ield
-0.60
ien
-0.60
thro
-0.57
tails
-0.57
POSITIVE LOGITS
pesky
0.84
they
0.80
fateful
0.79
cher
0.77
THEY
0.71
>>>>
0.66
there
0.66
chy
0.66
izoph
0.65
although
0.64
Activations Density 0.297%