INDEX
Explanations
references to a specific individual named Wes
New Auto-Interp
Negative Logits
Cald
-0.17
azor
-0.16
pton
-0.15
TRIES
-0.15
aine
-0.15
tea
-0.15
aska
-0.15
ilty
-0.15
rie
-0.14
Arb
-0.14
POSITIVE LOGITS
nesday
0.27
sex
0.22
ley
0.22
LEY
0.22
layan
0.21
Studi
0.21
SEX
0.20
entlich
0.20
leys
0.20
ker
0.17
Activations Density 0.005%