INDEX
Explanations
phrases related to events or actions that entail a significant impact or change
New Auto-Interp
Negative Logits
lav
-0.92
obal
-0.69
amaru
-0.68
ãĤ±
-0.68
convol
-0.66
riched
-0.64
contracted
-0.64
ollo
-0.63
ãĥ¼ãĥĨãĤ£
-0.63
ãĤ¶
-0.63
POSITIVE LOGITS
!
0.85
!.
0.78
.
0.78
offensively
0.74
.#
0.72
!,
0.72
!!!
0.72
¯
0.71
!:
0.71
;)
0.70
Activations Density 2.493%