INDEX
Explanations
mentions of people's names
New Auto-Interp
Negative Logits
uristic
-0.66
afety
-0.63
teness
-0.59
ocaust
-0.58
":"/
-0.58
humane
-0.57
bara
-0.57
ventory
-0.57
rina
-0.57
=/
-0.57
POSITIVE LOGITS
respectively
2.27
jointly
1.45
alike
1.44
together
1.38
respective
1.32
combined
1.29
mutually
1.25
together
1.19
Together
1.17
separately
1.14
Activations Density 6.197%