INDEX
Explanations
mentions of specific names or terms with varied linguistic structures
proper nouns, especially names and brands
New Auto-Interp
Negative Logits
BI
-0.55
¶ħ
-0.53
CONT
-0.52
semb
-0.51
taboola
-0.51
CONTR
-0.50
tags
-0.50
ASP
-0.49
-0.49
recru
-0.48
POSITIVE LOGITS
unia
0.67
idia
0.67
hess
0.66
veland
0.61
REDACTED
0.60
ë
0.58
enges
0.58
ragon
0.57
inces
0.57
ledge
0.56
Activations Density 0.623%