INDEX
Explanations
proper nouns, specifically names, particularly first names
the term "Col" followed by numerical identifiers, likely indicating colleges or institutions
New Auto-Interp
Negative Logits
Ducks
-0.67
Werewolf
-0.65
forth
-0.64
retract
-0.60
Canucks
-0.59
derog
-0.58
Californ
-0.57
puck
-0.57
Language
-0.57
doors
-0.56
POSITIVE LOGITS
ossal
1.49
ombo
1.35
umbo
1.27
leen
1.27
onel
1.25
onial
1.18
ophon
1.17
oured
1.11
angelo
1.07
ored
1.02
Activations Density 0.014%