INDEX
Explanations
mentions of notable individuals' names
words related to negative connotations or undesirable situations
New Auto-Interp
Negative Logits
pection
-0.66
SPONSORED
-0.65
owship
-0.65
SAY
-0.62
Takeru
-0.61
ster
-0.59
footing
-0.59
cling
-0.58
GBT
-0.58
SPA
-0.57
POSITIVE LOGITS
vous
0.95
ÅĤ
0.87
owicz
0.77
acan
0.76
henko
0.75
ewski
0.72
inis
0.71
án
0.70
Mehran
0.70
adesh
0.69
Activations Density 0.235%