INDEX
Explanations
phrases starting with "That is how" followed by a strong opinion or personal perspective
phrases that indicate explanations or descriptions of processes
New Auto-Interp
Negative Logits
Quest
-0.63
room
-0.62
effects
-0.60
artifacts
-0.60
wear
-0.57
hereafter
-0.57
yak
-0.57
actor
-0.57
iston
-0.56
oris
-0.56
POSITIVE LOGITS
beit
0.71
ativity
0.69
pedia
0.69
HCR
0.68
erest
0.67
soever
0.66
bill
0.64
ls
0.62
pher
0.61
much
0.61
Activations Density 0.030%