INDEX
Explanations
phrases indicating the presence of specific individuals in different settings
references to group dynamics or hierarchical structures in various contexts
New Auto-Interp
Negative Logits
uristic
-0.68
Topics
-0.68
ILA
-0.65
brate
-0.63
iu
-0.61
lessness
-0.60
sever
-0.59
Extensions
-0.58
Phys
-0.58
ometers
-0.57
POSITIVE LOGITS
midst
1.21
room
1.15
closet
1.08
foreground
1.05
doorway
1.03
vicinity
0.99
fray
0.98
trenches
0.97
womb
0.96
picture
0.94
Activations Density 0.218%