INDEX
Explanations
references to websites or email addresses ending in ".wh"
instances of the word "who."
New Auto-Interp
Negative Logits
²¾
-0.72
Ankara
-0.65
Lazarus
-0.65
Mali
-0.63
Medina
-0.59
rein
-0.59
improvised
-0.59
Rein
-0.58
Universe
-0.58
ITION
-0.57
POSITIVE LOGITS
wh
4.19
Wh
2.37
WH
1.74
Wh
1.65
wh
1.45
white
1.32
WH
1.30
who
1.29
Whit
1.24
whe
1.23
Activations Density 0.008%