INDEX
Explanations
phrases related to philosophical concepts
concepts and discussions related to philosophy
New Auto-Interp
Negative Logits
ells
-0.86
GV
-0.82
sg
-0.79
esty
-0.77
redd
-0.74
rake
-0.74
ords
-0.72
eor
-0.71
ell
-0.70
ardless
-0.69
POSITIVE LOGITS
ophical
1.00
philosophical
0.97
philosopher
0.93
philosophers
0.91
curiosity
0.90
philosoph
0.83
theoret
0.82
philosophy
0.80
dile
0.80
inco
0.80
Activations Density 0.006%