INDEX
Explanations
comparisons of varying degrees between two entities or actions
expressions of intensity or degree in sentiments
New Auto-Interp
Negative Logits
loe
-0.68
Nadu
-0.67
Reincarnated
-0.65
Sorceress
-0.65
Tir
-0.64
utenberg
-0.61
gged
-0.60
PLIED
-0.60
hetical
-0.59
privilege
-0.58
POSITIVE LOGITS
etheless
0.94
ernaut
0.73
rays
0.72
appropriately
0.71
©¶æ
0.71
ciating
0.69
é£
0.69
veter
0.69
د
0.66
ever
0.65
Activations Density 0.641%