INDEX
Explanations
religious or moral terms
words and phrases related to various forms of seriousness or severity
New Auto-Interp
Negative Logits
avia
-0.69
door
-0.69
WH
-0.66
©¶æ
-0.66
oan
-0.62
AW
-0.62
APS
-0.62
bsite
-0.62
aver
-0.61
ARC
-0.60
POSITIVE LOGITS
ness
1.47
nesses
1.28
ity
1.12
ities
0.90
ly
0.88
NESS
0.81
Magikarp
0.80
ous
0.77
lihood
0.76
liness
0.75
Activations Density 0.064%