INDEX
Explanations
references to specific locations or places
New Auto-Interp
Negative Logits
rape
-0.18
eters
-0.17
arkan
-0.17
thew
-0.16
ishly
-0.16
landa
-0.16
theless
-0.15
sis
-0.15
rep
-0.15
raq
-0.15
POSITIVE LOGITS
bos
0.36
HOLDER
0.27
OfBirth
0.21
able
0.21
holders
0.21
ful
0.21
lessness
0.21
holder
0.21
-holder
0.21
/time
0.20
Activations Density 0.087%