INDEX
Explanations
words related to people's names and titles
mentions of specific names or entities
New Auto-Interp
Negative Logits
Publisher
-0.75
washer
-0.71
ModLoader
-0.69
pread
-0.68
ħĭ
-0.66
Phill
-0.65
riad
-0.63
wallet
-0.62
pardon
-0.61
mble
-0.61
POSITIVE LOGITS
uation
1.20
uations
1.11
uated
1.03
uate
1.02
uating
1.01
uates
0.98
uable
0.88
anche
0.87
onge
0.87
arial
0.83
Activations Density 0.011%