INDEX
Explanations
the word "whole" followed by a positive adjective
New Auto-Interp
Negative Logits
intent
-0.90
++++++++++++++++
-0.88
Feinstein
-0.88
endants
-0.87
inputs
-0.85
KE
-0.85
yip
-0.85
inator
-0.83
Downloadha
-0.82
rf
-0.82
POSITIVE LOGITS
heartedly
2.10
hearted
1.51
meal
1.23
Foods
1.13
allo
1.11
whe
1.07
grown
1.02
ãĤ¨ãĥ«
0.99
swat
0.99
osaurus
0.98
Activations Density 0.383%