INDEX
Explanations
expressions of emotional or sensory experiences associated with "feel."
New Auto-Interp
Negative Logits
eteria
-0.07
quo
-0.07
idth
-0.07
же
-0.07
们
-0.07
éļĽ
-0.07
ũi
-0.07
isko
-0.07
acters
-0.07
olars
-0.07
POSITIVE LOGITS
ings
0.10
-good
0.08
inspace
0.08
illy
0.07
ingly
0.07
afort
0.07
lessly
0.07
604
0.07
424
0.07
ingen
0.06
Activations Density 0.005%