INDEX
Explanations
the concept of fundamental principles or basic knowledge
references to foundational concepts or principles
New Auto-Interp
Negative Logits
asus
-0.70
issions
-0.64
ated
-0.64
oping
-0.62
usable
-0.61
appa
-0.60
imar
-0.59
appiness
-0.59
Sus
-0.58
avement
-0.57
POSITIVE LOGITS
chool
0.79
layer
0.77
cape
0.77
basics
0.71
linger
0.70
matter
0.69
heet
0.68
yip
0.66
nutshell
0.66
fundamentals
0.64
Activations Density 0.017%