INDEX
Explanations
specialized terminology or jargon related to technology and media
references to potential threats and complex situations
New Auto-Interp
Negative Logits
âĢ
-0.95
à¨
-0.94
âĸij
-0.81
¯¯
-0.81
à©
-0.78
ntil
-0.77
few
-0.75
à¨
-0.75
âĹ
-0.75
ãĥĥãĥī
-0.75
POSITIVE LOGITS
!:
0.72
!
0.68
!".
0.64
to
0.64
!'
0.62
TO
0.62
To
0.61
!'"
0.60
!]
0.59
Spectre
0.59
Activations Density 1.124%