INDEX
Explanations
references to subjects and topics within the text
New Auto-Interp
Negative Logits
ushing
-0.17
ØŃÙĬ
-0.16
ardo
-0.15
/preferences
-0.15
ersions
-0.15
uder
-0.15
zo
-0.15
co
-0.15
lear
-0.15
usp
-0.14
POSITIVE LOGITS
ivity
0.44
ively
0.42
matter
0.39
matter
0.35
ivities
0.35
ive
0.32
ivism
0.29
Matter
0.29
ivist
0.29
IVE
0.25
Activations Density 0.015%