INDEX
Explanations
verbs that suggest deception
verbs and phrases that indicate personal interactions or relationships
New Auto-Interp
Negative Logits
Arrow
-0.81
irmation
-0.68
encing
-0.65
Quote
-0.65
irming
-0.62
ravel
-0.62
hatt
-0.61
ruction
-0.61
hern
-0.60
onement
-0.60
POSITIVE LOGITS
©¶æ¥µ
0.74
aciously
0.68
passionately
0.66
ij士
0.65
himself
0.65
herself
0.64
menstru
0.64
tirelessly
0.62
worshipped
0.62
vae
0.61
Activations Density 0.333%