INDEX
Explanations
words starting with 'wh'
occurrences of the word "wh."
New Auto-Interp
Negative Logits
rella
-0.75
Rein
-0.73
alia
-0.71
Desert
-0.66
tera
-0.65
Celeb
-0.65
rez
-0.65
ovic
-0.64
Variant
-0.64
Romania
-0.64
POSITIVE LOGITS
wh
3.50
Wh
1.86
wh
1.85
Wh
1.62
WH
1.39
thw
1.35
whipping
1.16
whe
1.10
th
1.09
whale
1.07
Activations Density 0.008%