INDEX
Explanations
proper nouns, specifically names of people or places that start with "Th"
the presence of the name "Thad" or similar variations
New Auto-Interp
Negative Logits
Reloaded
-0.91
ãģ®éŃĶ
-0.91
ITED
-0.88
76561
-0.87
assetsadobe
-0.83
keeping
-0.76
ãĤŃ
-0.76
ãģĭ
-0.75
915
-0.74
Spoiler
-0.73
POSITIVE LOGITS
irteen
1.08
reshold
1.05
irst
1.03
umb
1.03
orne
1.01
ought
1.01
istle
1.00
umbnail
0.98
orns
0.98
orough
0.97
Activations Density 0.015%