INDEX
Explanations
statements referring to underlying issues or structures
references to fundamental issues or structures that influence various contexts
New Auto-Interp
Negative Logits
asia
-0.85
alde
-0.83
apo
-0.82
aterasu
-0.81
ander
-0.81
cture
-0.79
ishers
-0.79
chens
-0.78
avis
-0.78
alid
-0.74
POSITIVE LOGITS
assumptions
0.96
principles
0.95
assumption
0.94
infrastructure
0.91
premise
0.91
fundamentals
0.89
structure
0.88
principle
0.87
theme
0.87
motivations
0.85
Activations Density 0.025%