INDEX
Explanations
references to principles and foundational concepts
New Auto-Interp
Negative Logits
gie
-0.19
akan
-0.16
NESS
-0.15
itude
-0.15
ney
-0.15
ÑĢÑĥ
-0.15
lord
-0.14
ropolis
-0.14
504
-0.14
raham
-0.14
POSITIVE LOGITS
-agent
0.30
ities
0.24
investigator
0.21
-Agent
0.19
ps
0.19
Investig
0.18
stown
0.18
pal
0.18
/ss
0.17
ized
0.16
Activations Density 0.020%