INDEX
Explanations
references to the concept of honor
New Auto-Interp
Negative Logits
verſ
-0.66
<unused15>
-0.66
-0.66
BSITE
-0.66
queſta
-0.65
<unused57>
-0.65
queſto
-0.65
<unused27>
-0.65
fashiola
-0.65
-0.64
POSITIVE LOGITS
Hon
1.66
honor
1.59
Hon
1.59
honour
1.55
hon
1.55
honoured
1.50
honored
1.50
hon
1.45
honors
1.44
HON
1.43
Activations Density 0.210%