INDEX
Explanations
words and phrases related to bodily functions, particularly urination and defecation
New Auto-Interp
Negative Logits
aido
-0.83
Flavoring
-0.79
arya
-0.77
ensen
-0.73
ician
-0.68
ICA
-0.67
ahn
-0.65
-0.63
illian
-0.62
*/(
-0.62
POSITIVE LOGITS
pee
1.33
pee
1.10
pees
0.97
ples
0.96
bott
0.90
ves
0.87
ble
0.86
cess
0.85
ved
0.84
Å¡
0.84
Activations Density 0.006%