INDEX
Explanations
the substring "ab" followed by a single-digit activation value
occurrences of the abbreviation "ab" in the text
New Auto-Interp
Negative Logits
virtue
-0.80
nomine
-0.74
Perse
-0.69
Coco
-0.65
bilt
-0.65
Celest
-0.63
backer
-0.61
Uriel
-0.60
Izan
-0.60
Patriot
-0.59
POSITIVE LOGITS
stract
1.34
bing
1.11
yrinth
1.10
urger
1.09
raham
1.04
road
1.03
dullah
1.03
bed
1.00
ecause
0.98
riel
0.96
Activations Density 0.038%