INDEX
Explanations
phrases indicating a strong opinion or warning
phrases indicating the need for clarity or correction in misunderstandings
New Auto-Interp
Negative Logits
ourses
-0.80
iliated
-0.72
ãĥŁ
-0.71
und
-0.69
onet
-0.66
affiliated
-0.66
"},"
-0.66
tumblr
-0.65
isson
-0.65
itted
-0.64
POSITIVE LOGITS
Kev
0.72
math
0.60
!:
0.60
Jude
0.55
caveats
0.54
=]
0.54
folks
0.53
mov
0.53
Galile
0.53
,,,,
0.53
Activations Density 0.220%