INDEX
Explanations
mentions of specific names, likely related to a news story or event
proper nouns related to people, likely focusing on names and roles
New Auto-Interp
Negative Logits
ij
-0.65
orses
-0.63
ccording
-0.61
iggurat
-0.61
rises
-0.61
utes
-0.60
olulu
-0.60
Pradesh
-0.59
SIZE
-0.59
cumbers
-0.58
POSITIVE LOGITS
extraord
0.99
lein
0.90
geist
0.88
iffe
0.86
lain
0.85
jee
0.84
bilt
0.84
="#
0.82
sonian
0.81
ufact
0.81
Activations Density 0.055%