INDEX
Explanations
responses indicating a direct personal experience or perspective
New Auto-Interp
Negative Logits
ibaba
-1.15
hillary
-1.15
artments
-1.00
inge
-1.00
netflix
-1.00
selage
-0.98
eeper
-0.97
erva
-0.97
Thumbnail
-0.96
dos
-0.95
POSITIVE LOGITS
damned
0.93
uncond
0.93
Clive
0.88
Pers
0.84
overr
0.84
Hercules
0.82
Bers
0.82
Cann
0.82
Carth
0.81
Lucky
0.81
Activations Density 1.343%