INDEX
Explanations
references to the foundational or fundamental aspects of something
phrases that discuss foundational arguments or justifications
New Auto-Interp
Negative Logits
opers
-0.87
oping
-0.79
ipop
-0.73
cha
-0.73
Wend
-0.72
oho
-0.71
Rosenberg
-0.71
estern
-0.69
andro
-0.68
osen
-0.68
POSITIVE LOGITS
basis
0.93
underpin
0.79
foundation
0.79
plates
0.77
grounding
0.75
premise
0.75
thereof
0.73
grounds
0.72
footing
0.71
assumptions
0.69
Activations Density 0.031%