INDEX
Explanations
proper nouns related to a specific person or location
references to a specific individual, likely a political figure
New Auto-Interp
Negative Logits
saves
-0.62
Bound
-0.62
exempt
-0.62
expecting
-0.61
matched
-0.60
blank
-0.60
map
-0.59
self
-0.58
seed
-0.58
coding
-0.58
POSITIVE LOGITS
lus
4.99
lis
1.27
lux
1.23
leck
1.14
alus
1.13
lio
1.12
xus
1.09
rus
1.07
ilus
1.05
laus
1.03
Activations Density 0.026%