INDEX
Explanations
self-referential expressions indicating personal feelings and experiences
New Auto-Interp
Negative Logits
ÐĴики
-0.15
ênh
-0.15
okable
-0.15
grunt
-0.15
lamaya
-0.15
owitz
-0.15
ladıģı
-0.14
.scalablytyped
-0.14
eldre
-0.14
ichick
-0.14
POSITIVE LOGITS
absolutely
0.35
love
0.33
fell
0.30
LOVE
0.29
ad
0.29
falling
0.29
fall
0.27
Love
0.26
loves
0.26
Absolutely
0.26
Activations Density 0.140%