INDEX
Explanations
mentions of the word "water"
mentions of the word "ater" likely indicating references to water or related topics
New Auto-Interp
Negative Logits
Disk
-0.68
ured
-0.67
ures
-0.66
diabetic
-0.64
Chao
-0.62
INS
-0.62
fung
-0.61
sanity
-0.60
Ko
-0.60
Pigs
-0.60
POSITIVE LOGITS
pillar
1.09
IAL
0.97
ater
0.93
apy
0.93
ickson
0.90
idon
0.88
eor
0.86
iatus
0.85
apter
0.84
ites
0.84
Activations Density 0.018%