INDEX
Explanations
expressions of personal opinions and experiences
New Auto-Interp
Negative Logits
xCD
-0.13
antee
-0.13
urgency
-0.13
utterstock
-0.13
tabl
-0.13
Ø
-0.12
urgence
-0.12
кÑĥÑĤ
-0.12
urgent
-0.12
>(()
-0.12
POSITIVE LOGITS
like
0.47
liked
0.43
likes
0.40
liking
0.39
enjoy
0.37
Like
0.36
LIKE
0.36
liked
0.35
enjoyed
0.35
love
0.34
Activations Density 0.316%