INDEX
Explanations
nouns and names of people or historical figures
New Auto-Interp
Negative Logits
-0.39
Wadsworth
-0.37
wake
-0.37
vac
-0.34
vas
-0.34
Vaz
-0.34
Vasco
-0.33
ûte
-0.32
Wak
-0.32
Vac
-0.32
POSITIVE LOGITS
Will
2.23
Willi
2.16
Wil
2.11
Will
2.09
Wil
2.02
Willi
2.00
WILL
1.98
Willa
1.86
William
1.86
wil
1.85
Activations Density 0.857%