INDEX
Explanations
proper nouns and organizations related to politics, academia, and possibly specific regions
references to specific names, particularly related to notable individuals or entities
New Auto-Interp
Negative Logits
holders
-1.04
holder
-0.82
mble
-0.72
erness
-0.71
itures
-0.70
rights
-0.69
ages
-0.68
hetical
-0.67
comings
-0.66
igators
-0.66
POSITIVE LOGITS
upiter
1.01
unction
1.00
oint
0.95
ournals
0.93
utsu
0.91
igsaw
0.88
ealous
0.88
avascript
0.86
ernaut
0.86
ihad
0.84
Activations Density 0.094%