INDEX
Explanations
instances of the word "pretend" or variations of it
instances of the word "pretend" and its variations
New Auto-Interp
Negative Logits
GOODMAN
-0.67
cutting
-0.66
ighth
-0.63
vez
-0.62
sbm
-0.61
cedented
-0.61
stown
-0.60
cent
-0.60
Citation
-0.58
winner
-0.58
POSITIVE LOGITS
innocence
1.04
ignorance
0.78
inet
0.70
allegiance
0.69
pas
0.68
forgot
0.68
insanity
0.68
otherwise
0.64
idently
0.64
inco
0.63
Activations Density 0.053%