INDEX
Explanations
sentences related to legal matters and consequences
references to potential harmful actions or conditions, particularly related to health and societal issues
New Auto-Interp
Negative Logits
Carnage
-0.73
Skydragon
-0.72
Allies
-0.72
eers
-0.70
Survivors
-0.65
oday
-0.65
ESCO
-0.64
ccording
-0.63
ĸļ
-0.62
ebus
-0.62
POSITIVE LOGITS
wu
0.72
|
0.69
nob
0.65
gmaxwell
0.63
bas
0.62
uno
0.61
hao
0.59
prin
0.59
<<
0.57
*
0.56
Activations Density 0.130%