INDEX
Explanations
sentences indicating a similar situation, fact, or statement
phrases that express equivalence or similarity
New Auto-Interp
Negative Logits
ILCS
-0.68
++++++++++++++++
-0.59
ded
-0.54
vironment
-0.54
Str
-0.53
itory
-0.52
distraction
-0.50
zos
-0.50
Slim
-0.50
Bones
-0.50
POSITIVE LOGITS
applies
1.06
apply
0.99
applied
0.86
applicable
0.74
prev
0.73
happ
0.73
Applic
0.69
aila
0.68
prevailed
0.68
iat
0.68
Activations Density 0.201%