INDEX
Explanations
uncommonly used words or phrases
phrases that indicate relative comparisons or contrasts
New Auto-Interp
Negative Logits
ais
-0.70
visor
-0.69
inus
-0.68
assadors
-0.66
adder
-0.66
ondo
-0.66
frey
-0.65
iseum
-0.65
elin
-0.64
atorium
-0.64
POSITIVE LOGITS
than
0.99
anymore
0.95
altogether
0.92
WARE
0.87
nor
0.86
erous
0.79
than
0.78
Than
0.77
bother
0.77
comprom
0.71
Activations Density 0.158%