INDEX
Explanations
names and titles of political figures
proper nouns or specific names
New Auto-Interp
Negative Logits
Vera
-0.86
enthusi
-0.81
Byr
-0.75
LIA
-0.74
Victoria
-0.73
phys
-0.73
COURT
-0.70
SEN
-0.70
Plaint
-0.69
ãĢĮ
-0.69
POSITIVE LOGITS
ink
1.10
oop
1.07
arp
1.06
ap
1.04
isk
1.04
inkle
1.02
ape
1.02
acks
1.02
ip
1.01
opp
0.99
Activations Density 0.245%