INDEX
Explanations
phrases mentioning legal and bureaucratic terms
words or phrases related to significant societal issues or threats
New Auto-Interp
Negative Logits
sacrific
-0.82
ende
-0.80
reflection
-0.76
obser
-0.76
readable
-0.74
unconscious
-0.74
floppy
-0.73
outline
-0.73
canv
-0.72
confir
-0.71
POSITIVE LOGITS
¯
1.16
âĢł
0.95
¶
0.87
ï¸ı
0.85
âģ
0.85
°
0.85
ï¸
0.84
§
0.84
SHIP
0.84
¨
0.83
Activations Density 0.221%