INDEX
Explanations
phrases related to meaningful actions or societal impact
references to the value of life and sacrifice
New Auto-Interp
Negative Logits
Allaah
-0.75
Bronze
-0.67
steroid
-0.66
ether
-0.62
mixer
-0.62
Hof
-0.61
ailability
-0.61
transformer
-0.59
capacitor
-0.59
Masquerade
-0.57
POSITIVE LOGITS
emulate
0.81
abouts
0.81
tre
0.75
igraph
0.73
intend
0.72
should
0.72
isse
0.71
kiss
0.70
deems
0.70
entails
0.66
Activations Density 0.885%