INDEX
Explanations
mentions of the word "zone" and its variations
New Auto-Interp
Negative Logits
angelo
-0.17
enor
-0.17
intree
-0.16
acro
-0.16
essen
-0.15
feit
-0.15
paces
-0.14
antage
-0.14
<!--[
-0.14
arhus
-0.14
POSITIVE LOGITS
mos
0.18
naÄįenÃŃ
0.17
anian
0.17
anne
0.17
ç¶
0.16
arks
0.16
bek
0.15
eki
0.15
iris
0.15
naÄį
0.15
Activations Density 0.005%