INDEX
Explanations
mentions of a specific place name
references to the word "Glad."
New Auto-Interp
Negative Logits
Mutant
-0.71
Moore
-0.70
temper
-0.68
session
-0.66
Nero
-0.63
Hawkins
-0.63
practice
-0.63
square
-0.62
Apocalypse
-0.61
imp
-0.61
POSITIVE LOGITS
lad
4.64
Lad
1.54
Glad
1.30
lav
1.17
adder
1.05
lass
1.02
laden
1.02
lol
1.02
lam
1.01
los
1.01
Activations Density 0.010%