INDEX
Explanations
phrases expressing admiration or surprise
phrases expressing strong opinions or reactions about various subjects
New Auto-Interp
Negative Logits
é¾įå¥ij士
-0.77
¶
-0.72
Oswald
-0.67
76561
-0.65
ILCS
-0.65
CI
-0.62
Entered
-0.61
\<
-0.61
Freddie
-0.61
=~
-0.59
POSITIVE LOGITS
coincidence
1.01
ils
0.97
uras
0.86
wonderful
0.86
lot
0.84
bunch
0.81
lovely
0.81
terrific
0.79
marvelous
0.78
fantastic
0.78
Activations Density 0.032%