INDEX
Explanations
references to significant concepts or events in historical contexts
New Auto-Interp
Negative Logits
-between
-0.18
front
-0.17
Plus
-0.16
stad
-0.15
ico
-0.15
/from
-0.15
à¸į
-0.14
Genuine
-0.14
oca
-0.14
Plus
-0.14
POSITIVE LOGITS
fact
0.39
effect
0.35
essence
0.35
part
0.31
itself
0.28
turn
0.28
truth
0.28
large
0.27
principle
0.26
theory
0.26
Activations Density 0.222%