INDEX
Explanations
expressions of curiosity or questioning thoughts
New Auto-Interp
Negative Logits
Datuak
-0.73
Astoria
-0.72
aroa
-0.71
%
-0.70
caloosa
-0.65
IRQn
-0.62
receive
-0.61
Alba
-0.58
Alec
-0.58
ibatis
-0.57
POSITIVE LOGITS
wonder
1.95
Wonder
1.90
wondering
1.87
wonder
1.86
Wonder
1.82
WONDER
1.74
wondered
1.64
wonders
1.57
Wondering
1.55
Wonders
1.45
Activations Density 0.056%