INDEX
Explanations
conditional statements indicating potential situations or actions
conditional phrases that express hypothetical situations
New Auto-Interp
Negative Logits
Els
-0.76
OPLE
-0.75
Rated
-0.62
OWN
-0.61
SPONSORED
-0.60
Canary
-0.56
Cu
-0.55
OTO
-0.55
arsity
-0.55
oller
-0.55
POSITIVE LOGITS
disclaim
0.75
math
0.65
claimer
0.61
ndra
0.59
tc
0.59
sv
0.58
apor
0.58
otal
0.57
ertain
0.56
estial
0.56
Activations Density 0.112%