INDEX
Explanations
instances of the word "Take" indicating suggestions or prompts
New Auto-Interp
Negative Logits
_FAULT
-0.15
na
-0.15
vert
-0.14
ramp
-0.14
conduct
-0.14
rita
-0.14
avanaugh
-0.14
shed
-0.14
vester
-0.13
igo
-0.13
POSITIVE LOGITS
óc
0.17
eyle
0.15
ÏĢοί
0.15
ERY
0.15
%#
0.15
ifornia
0.14
illation
0.14
ayah
0.14
loub
0.14
çīĩ
0.14
Activations Density 0.021%