INDEX
Explanations
friendly conversational expressions
conversational expressions and interactions
New Auto-Interp
Negative Logits
Instr
-0.71
Mobil
-0.70
Footnote
-0.65
Vaugh
-0.62
Nielsen
-0.60
ãĥ¼ãĥĨ
-0.59
Equ
-0.58
Barrett
-0.58
Mobil
-0.58
Restoration
-0.57
POSITIVE LOGITS
dont
1.15
english
1.10
doesnt
1.08
didnt
1.04
alot
1.03
tho
1.03
americ
1.02
!!!!
0.96
fuck
0.93
pics
0.93
Activations Density 0.701%