INDEX
Explanations
positive or impactful actions or characteristics described in text
actions or events that have a significant impact or influence on circumstances
New Auto-Interp
Negative Logits
arta
-0.80
>]
-0.72
ilings
-0.69
picture
-0.66
guide
-0.63
secondary
-0.61
photo
-0.61
}}
-0.61
device
-0.60
oscope
-0.60
POSITIVE LOGITS
raining
0.80
downhill
0.69
doub
0.68
uphill
0.68
escap
0.65
overload
0.63
triv
0.63
atche
0.63
impossible
0.62
ãĥķãĤ©
0.62
Activations Density 0.925%