INDEX
Explanations
phrases related to intense or impactful actions or events
expressions related to intensity or significance
New Auto-Interp
Negative Logits
Moroc
-0.73
WARN
-0.65
ã
-0.65
ÂŃ
-0.60
ijah
-0.59
Annex
-0.58
ãĥ¼ãĥĨ
-0.57
elfth
-0.56
Fas
-0.56
Instr
-0.55
POSITIVE LOGITS
;)
1.13
:)
1.00
!!!!
0.99
tho
0.99
alot
0.98
doesnt
0.97
!!!
0.96
:-)
0.93
!!
0.92
haha
0.91
Activations Density 1.037%