INDEX
Explanations
names of specific individuals or organizations
names and entities related to specific organizations or individuals
New Auto-Interp
Negative Logits
IBLE
-0.86
xual
-0.80
entials
-0.78
ential
-0.75
encers
-0.74
ework
-0.72
Ö¼
-0.72
iments
-0.71
phrine
-0.70
drops
-0.70
POSITIVE LOGITS
kat
0.82
won
0.81
istani
0.80
lov
0.78
patrick
0.78
orea
0.77
lar
0.76
rieg
0.76
ota
0.75
pps
0.73
Activations Density 0.065%