INDEX
Explanations
mentions of significant events or figures in history
New Auto-Interp
Negative Logits
Parables
-0.73
âĵĺ
-0.70
verbs
-0.63
Extras
-0.62
enhagen
-0.62
redes
-0.58
advoc
-0.56
":["
-0.56
possessions
-0.56
titles
-0.55
POSITIVE LOGITS
000
1.06
040
0.76
500
0.76
400
0.75
600
0.74
045
0.74
800
0.72
070
0.72
048
0.70
ottest
0.70
Activations Density 0.050%