INDEX
Explanations
names and titles of people, especially in a professional setting
New Auto-Interp
Negative Logits
interrupted
-0.74
antha
-0.67
etheless
-0.63
sweep
-0.62
bottleneck
-0.62
ailability
-0.62
ipolar
-0.61
pressures
-0.61
rha
-0.61
olicy
-0.60
POSITIVE LOGITS
brate
1.54
brates
1.50
ller
1.21
llers
1.19
levision
1.18
achers
1.11
llo
1.09
lla
1.07
achable
1.05
lli
1.05
Activations Density 0.023%