INDEX
Explanations
phrases or descriptions indicating a degree of something being slightly off or problematic
descriptions of things that are slightly problematic or of moderate concern
New Auto-Interp
Negative Logits
apons
-0.89
hire
-0.87
Flavoring
-0.84
itivity
-0.83
itars
-0.80
anwhile
-0.80
andise
-0.78
orsi
-0.77
Siber
-0.77
utenberg
-0.77
POSITIVE LOGITS
confused
1.12
rusty
1.09
confusing
1.07
bit
1.06
tricky
1.02
misunderstood
1.01
clumsy
1.01
awkward
1.00
prick
0.98
unclear
0.97
Activations Density 0.043%