INDEX
Explanations
phrases related to similarities or comparisons between different situations or entities
phrases indicating causation or connections between ideas
New Auto-Interp
Negative Logits
worthy
-0.58
âĢIJ
-0.56
roy
-0.56
erity
-0.56
oper
-0.55
hess
-0.54
quality
-0.53
underrated
-0.53
Raphael
-0.52
orial
-0.52
POSITIVE LOGITS
redes
0.70
plague
0.67
[|
0.65
applies
0.65
uman
0.63
governs
0.63
osponsors
0.62
brates
0.62
ricanes
0.61
romy
0.60
Activations Density 0.132%