INDEX
Explanations
descriptive terms or actions related to disruptive or damaging transformation processes
terms related to degradation or decline
New Auto-Interp
Negative Logits
OWS
-0.83
Reviewer
-0.82
razil
-0.80
glers
-0.75
STER
-0.71
Holmes
-0.71
aneers
-0.69
ONY
-0.66
amia
-0.66
intendent
-0.65
POSITIVE LOGITS
ync
0.98
resil
0.92
ktop
0.91
irable
0.88
perate
0.87
embr
0.86
erve
0.85
semb
0.83
erving
0.80
iple
0.79
Activations Density 0.005%