INDEX
Explanations
mentions of a specific word 'Naz'
references to a specific individual named "Naz."
New Auto-Interp
Negative Logits
ACTED
-0.75
#$
-0.74
IVES
-0.73
lessly
-0.72
ENGTH
-0.71
boiling
-0.66
ESSION
-0.66
tenance
-0.65
IVE
-0.64
OME
-0.64
POSITIVE LOGITS
areth
1.18
ril
1.00
Naz
1.00
aji
0.98
imet
0.97
oche
0.96
emonic
0.91
anas
0.90
oid
0.88
ollah
0.87
Activations Density 0.005%