INDEX
Explanations
expressions of preference or liking towards various subjects
likes and dislikes
New Auto-Interp
Negative Logits
BSE
-0.46
TCU
-0.46
Barriers
-0.45
ことはありません
-0.45
SOA
-0.44
EDR
-0.44
Osw
-0.43
FMC
-0.43
MSW
-0.42
Crucible
-0.42
POSITIVE LOGITS
Liked
0.80
liked
0.79
liked
0.76
Liked
0.75
Dislikes
0.73
dislike
0.72
Likes
0.68
LIKE
0.68
likes
0.66
likes
0.65
Activations Density 0.014%