INDEX
Explanations
mentions of a person dominating or excelling in a particular field
New Auto-Interp
Negative Logits
potion
-0.62
robe
-0.56
illy
-0.52
masturbation
-0.50
analys
-0.50
equation
-0.49
contam
-0.49
hello
-0.49
hua
-0.48
Puzzles
-0.47
POSITIVE LOGITS
igator
1.12
sorts
1.04
igators
1.02
usions
0.99
kinds
0.94
ocating
0.94
uring
0.93
uding
0.86
usion
0.84
ocative
0.84
Activations Density 0.123%