INDEX
Explanations
references to hierarchical structures or titles related to authority or accomplishment
New Auto-Interp
Negative Logits
ABE
-0.73
Kamp
-0.72
ORTS
-0.67
TPS
-0.66
uana
-0.65
Citation
-0.65
Dragonbound
-0.63
Bach
-0.61
ensation
-0.61
Soy
-0.60
POSITIVE LOGITS
ipel
1.38
itect
1.15
bishop
1.13
ival
1.10
ivist
0.99
adia
0.98
rival
0.96
aic
0.95
di
0.93
iving
0.84
Activations Density 0.006%