INDEX
Explanations
questions or inquiries
questions or inquiries
New Auto-Interp
Negative Logits
apers
-0.77
ality
-0.77
apan
-0.73
aper
-0.72
alities
-0.66
ened
-0.64
Nadu
-0.64
poral
-0.64
orough
-0.63
wedge
-0.61
POSITIVE LOGITS
Nope
1.48
Nah
1.22
Yep
1.14
Yeah
1.13
����
1.13
Absolutely
1.05
Huh
1.00
Yes
0.99
Probably
0.98
Really
0.98
Activations Density 0.071%