INDEX
Explanations
phrases indicating the end or conclusion of something
instances of the word "end."
New Auto-Interp
Negative Logits
ppo
-0.84
chy
-0.67
issance
-0.66
CHA
-0.66
kers
-0.65
Byrd
-0.63
kson
-0.61
IGHTS
-0.61
kaya
-0.60
aution
-0.60
POSITIVE LOGITS
angered
1.07
lich
1.01
angering
0.99
owment
0.99
orph
0.95
ocrine
0.92
ocrin
0.90
urance
0.88
end
0.86
orse
0.83
Activations Density 0.025%