INDEX
Explanations
words related to providing support or evidence for a claim or action
words related to support or endorsement
New Auto-Interp
Negative Logits
ities
-0.80
urious
-0.74
ILCS
-0.73
illin
-0.70
Hots
-0.68
hester
-0.68
alde
-0.67
istics
-0.65
anny
-0.65
thora
-0.64
POSITIVE LOGITS
backed
0.79
abies
0.79
track
0.77
GROUND
0.77
backing
0.75
drive
0.74
steen
0.72
raise
0.71
swing
0.71
drops
0.69
Activations Density 0.024%