INDEX
Explanations
proper nouns
mentions of specific names or titles related to individuals or organizations
New Auto-Interp
Negative Logits
orthy
-0.84
fare
-0.77
OFF
-0.77
Tayyip
-0.73
awa
-0.73
NPR
-0.71
oria
-0.69
aiden
-0.67
oided
-0.66
ãĤ®
-0.65
POSITIVE LOGITS
iliary
0.82
cffff
0.78
omething
0.70
bats
0.69
bluff
0.68
iosyn
0.65
geoning
0.65
ly
0.64
esville
0.63
vier
0.63
Activations Density 0.070%