INDEX
Explanations
phrases related to physical actions or confrontations involving objects or people
connective words or phrases that indicate relationships or comparisons
New Auto-Interp
Negative Logits
HAM
-0.65
HI
-0.63
rison
-0.63
raq
-0.58
mma
-0.58
hips
-0.57
Gamble
-0.56
camp
-0.54
ppa
-0.54
lain
-0.54
POSITIVE LOGITS
the
0.94
the
0.90
rontal
0.72
Corpus
0.71
tha
0.66
largeDownload
0.62
The
0.62
The
0.60
illary
0.60
Õ
0.60
Activations Density 0.382%