INDEX
Explanations
academic thinkers and authors
New Auto-Interp
Negative Logits
proof
0.51
fl
0.41
im
0.40
private
0.39
prompt
0.38
perso
0.38
break
0.38
out
0.37
dead
0.37
sets
0.37
POSITIVE LOGITS
sociologist
0.68
anthropologist
0.63
filóso
0.62
Nietzsche
0.60
anthropologists
0.59
философ
0.59
Sociology
0.59
ética
0.59
philosophers
0.58
मनोविज्ञान
0.58
Activations Density 0.083%