INDEX
Explanations
topics related to human society and its implications
repeated mentions of the word "human" in various contexts.
New Auto-Interp
Negative Logits
**)
-0.48
aneurysm
-0.45
**
-0.45
élector
-0.45
oscu
-0.44
mukaan
-0.44
rurale
-0.43
Teach
-0.43
ventre
-0.43
oscura
-0.42
POSITIVE LOGITS
human
1.71
humans
1.49
human
1.46
HUMAN
1.43
Human
1.39
humano
1.39
Human
1.39
humaine
1.34
humain
1.33
Humans
1.29
Activations Density 0.227%